All rights are reserved with the Punjab Curriculum & Textbook Board Lahore. 
Prepared by Punjab Curriculum & Textbook Board Lahore. 
Approved by: Federal Ministry of Education, Curriculum Wing, Islamabad. 


Chapter 













NORMAL DISTRIBUTION La) 
1, __| SAMPLING TECHNIQUES AND SAMPLING ee 
‘DISTRIBUTIONS 


Ta estArON - 
CORRELATION 

[s__ |assocunons nt an as 

| 16 _| ANALYSIS OF TIME SERIES 


APPENDIX | SAMPLING DISTRIBUTIONS FROM NORMAL _ 
A POPULATIONS 
AR x STATISTICAL TABLES 












Author: Supervised by: 
Prof. Muhammad Rauf Chaudhary Madiha Mehmood 
Govt. College for Boys, Subject specialist, PCTB 
Gujranwala. 
Editors: Mr. Mazhar Hayat 
Subject specialist, PCTB 
_ Prof. Muhammad Khalid . 4 
Ex-Director (Technical), PCTB Artist: Ayesha Wahee 
. wr _ Director (Manuscripts): Mrs. Nisar Qamar 
} Publisher: Bright Way Publisher, Lahore. 
Printer: Qudrat ullah Printers, Lahore. 
Date of Printing Edition Impression Copies Price 


Nov.2019 1st 16th 7,000 142.00 





: 
| 
| 
| 
| 
| 
| 





NORMAL 
DISTRIBUTION 





10 


10.1 NORMAL DISTRIBUTION 


The normal distribution is undoubtedly the most important and frequently used of all 
probability laws, because 
(i) .the normal random variable does frequently occur in practical problems such as heights 
and weights of individuals, I.Q. scores, errors of measurements, efc. 
(ii) it is the limiting form of many other probability laws and hence provides an accurate 
approximation to them. 
(iii) it is also the limiting distribution on the well known centred limit theorem (as discussed 
in Theorem 11.6). | 
Thus a great many techniques used in applied statistics are based on the normal 
distribution. The formal definition follows. 
10.1.1 Normal Probability Density Function. A continuous random variable X is normally 
distributed if and only if its probability density function is 





ee] 
f(x) = n(x; 1,0) = mans ¢ 2\ ¢ for —co <x < © 
7t 


where f is any real number (i. e., —oo < ML < cc) and o must be positive(ie, o > 0), 


tm (= 3.141592654----- ) and e (= 2.718281828::::: ) are constants, x is the value of 
random variable X and f(x) is the density ( ordinate ) at X = x. 


A normal distribution is characterized by two parameters f and 4, its-mean and 
standard deviation respectively. Sometimes it is denoted by N( 4, o2 ). Thus 


X ~ Ny, a7) 
means that a random variable X is normally distributed with its mean pf and variance o”. 


10.1.2 Shape of Normal Distribution. The 
graph of a normal probability density function is 
called a normal curve. As can be seen from the 
definition, the probability density function for a 
normal random variable has a unimodal, 
symmetrical and bell shaped distribution. 


Figure 10.2 shows the graphs of some 
typical normal density functions for various LL x 


values of the parameters #: and o. The parameter 
oO controls the relative flatness of the curve. Fig 10.1 Normal distribution 








Statistics — Part Il 





2 ee — 
(i) Keeping j constant and decreasing oO causes the density function to become more 
sharply peaked, thus giving higher probabilities of X being close to pL. 

(ii) Keeping p constant and increasing o causes the density function to flatten, thus giving 
lower probabilities of X being close to p. 
(iii) If o is held constant and yp is varied, the shape of the density function remains the 


same with its centre moving to the location of w. 





Onn u=5 “w=6 WT x (Viton mir | : | 
Fig. 10.2 Normal probability density functions [ 
' 


10.1.3 Properties of Normal Distribution. The following are the main properties of normal 
distribution (or curve). | 
(1) Continuous Distribution. The normal probability distribution 
distribution that ranges from — ee to + ce. 
Ry = {x: -e <x < +00} 


is a continuous 


(2) Total Probability. The total area under the normal curve is unity. That is, 
P(i-2w < X< +00) = | 
(3) Mode and Maximum Ordinate. The normal probability density function is unimodal 


(single peaked), its mode is ff and its maximum ordinate at x = p is 


lf H-u 
J -H44) eee 


| IE desma iia o f2n 
Symmetrical Distribution. The normal probability distabiton is a symmetrical 
distribution. Thus : . 
(7) The mean, median and mode mene at [. 
Mean = Median = Mode = pL 
(ii) The lower quartile x,,, or Q, and the upper quartile *a) 75 OF Q, are 


(4) 


equidistant from its mean pL. 
21775 a eam e025 
(iii) All odd order moments about mean are zero. — pe HE Shy 


MH, = Hy = Hs =*** = 0 fie; 





Normal Distribution | sill 3 | 3 


(S) 


(6) 


(7) 


(8) 


(9) 


(10) 


(11) 


Special areas under the curve. No matter what values of 44 and © are, the areas under 
the normal curve remain in fixed proportions within a specified number of o on either 
side of pt. For example, 


() Peu- ao < X < p+o) = 0.6827 
(i) P(w-—-20 < X < p+ 2a) = 0.9545 
(iii) P(iu-30 < X < p+ 30) = 0.9973 


Median, Quartiles and Quartile Deviation. In a normal probability distribution, the 
median, the lower and upper quartiles and the quartile deviation are 


Q.D(X) = “on — Sos = 0.67456 == 9 


Xo25 + *0.75 Pepe 0752025 
2 1.349 


Variance, Standard Deviation and Mean Deviation. In a normal probability 
distribution, the variance, the standard deviation and the mean deviation are 


Vari X) = o? 
SOX) = 6 


M. D(X) = o |= = 0.79796 = 6: 
= 3 


Moments and Moment Ratios. In a normal probability distribution, the first four 
moments about mean and the moment ratios are 


i= 





H, = 0, H, = 0°, H; = 0, Hy, = 30° 
Hence = : ——— = Q, ie AS SS 
B, u3 (o2 )° 2 2 (a2 2 


Points of Inflexion. The points of inflexion of the normal probability density function 
are equidistant from mean p, they are at x = p — o and x = pt + o. The normal 
probability distribution is a bell shaped distribution. 


Asymptotic Curve. The normal curve is asymptotic. to the x-axis, that is, as | x | grows without 


bound, the curve gets closer and closer to the x-asis but always stays above it. The curve has a 
single peak in the middle and tapers off gradually at both ends and never meets the x-axis. 


Reproductive Property. \f X, and X2 are two independent normal random variables having 


distributinos M(u,, 07) and N(ji,,. 07) respectively, then 


oad 


sie.) ju Tne 


—~ es ee eee ten’ 





4 : on . | . Statistics — Part Ul 


their sum xX, + X, is also a normal random variable having the 
distribution : 


N(u, + Hy. OF + 03). 
10.1.4 Normal Cumulative Distribution Function. The cumulative distribution function for 
the normal random variable X is 


F(x) = P(X Sx) = P(-~e <X Sx) 





x x 1 wally ( u~H } 
= J f(u) du = { —~—e? 4 du 
eee Ne ee Om er | 
‘Unfortunately, this integration cannot be carried out in the closed form. Numerical 
techniques could be used to evaluate the integral for specific values of ff and o. The various 
possible values of j and o result in a family of: unlimited number of different normal 
distributions. It is, thus, necessary to tabulate the standard normal cumulative distribution 
- function, that can be used to evaluate the cumulative distribution function for a normal random 
variable with any mean 1 and any standard deviation o. 


10.2 STANDARD NORMAL RANDOM VARIABLE 


A normal random variable X with mean y and standard deviation o can easily be 
transformed into.a $tandard normal random variable Z by the transformation 
| - : Lae 
which has mean 0 and variance 1. o 
10.2.1 Standard Normal Distribution. If the random variable X has a normal distribution 
with mean f and variance 07, then the random variable Z = (X —y)/o has a standard 


_- normal distribution with méan 0 and variarice 1. 


Theorem 10.1 If X ~ Nu, 07) and Z=(X-4)/o, then 


Z~ NO, 1) 


Since the standard normal probability density function and cumulative distribution 
function are of such importance, we shall use special symbols for them. 


10.2.2 Standard Normal Probability Density Function. The probability density function of 
He standard normal variable Z, denoted by: 9( z), :is given as 


on2?/2 


9(z) = a a ce for —90 < z < co 





Thus @( z) is the value of — 

standard normal probability density 
function at Z = z. Therefore, p( z) is 
‘called as the ordinate of the standard 
normal curve at Z = z. The ofdinates 
2 z) have been tabulated for various: 
values of z in Table 7. 


Theorem 10.2 If Z ~ N(0,-1), then 





Fig. 10.3 Ordinate of standard normal 
g(-z) = 9(z) : probability density function 
at Z =z 





1) eee ee Oo ee eee eee aaa _>- 


it 


Normal Distribution 2 = ! ~ SpE are. 5 
10.2.3 Standard Normal Cumulative Distribution Function.. The cumulative distribution. 
function of the standard normal variable Z, denoted by (z ), is given as 


O(z) = P(Z Sz) = P(-0< ZS2z) 
z z 
~ J fC) du = i} Sea a du 


Thus @( z) is the cumulative 
probability up to Z = z in the standard 
normal distribution. The cumulative 
probabilities @( z) have been tabulated 
for various values of z in Table 9. 





Note that 
@(--) = 0 are : 
D+) = | . Fig. 10.4 Cumulative probability in 
standard normal eT 
upto Z = Z. 


Theorem 10.3 If Z ~ N(0, 1) and a, b are any real numbers, then 
() P(Z<a) = Wa) | 
(iif) P(Z2a) = 1- Ma) 
(iii) Pla<Z<b) = Ob) - Ma) 
Theorem 10.4 If Z ~ N(0, 1), then for any real value a. 
(i) @-a) = 1- Ma) 
(ii) P(Z2a) = O-a) 
(iii) P(\Z| <a) = 2%(a)-1 
(iv) P(\Z| 24) = 20(-a) 
10.2.4 Inverse Standard Normal Cumulative Distribution Function. The inverse standard 


normal cumulative distribution function determines a value z corresponding to a given value of 
the cumulative probability. Suppose that cumulative probability at Z = z 1s p, then: we have 


Mz) = P(ZSz) = 
®"(p) =z 


The values of ®~'( p) have been tabulated for various values of. cumulative probability D in 
Table 10. For example, 


@( 1.96 ) 
® -'( 0.975 ) 


0.975 
1.960 


10.2.5 Use of the Standard Normal Tables. We now non how the tables of the standard i 


normal distribution are used illustrating their direct or inverse use, 


ieee new Eee 





6 


Statistics — Part I 





Example 10.1 


(i) 
(vi) 


(tii) 


(vy) 


Write down the equation of the standard normal distribution. 


Find the value of maximum ordinate of the standard normal curve correct to four places 
of decimal. 


Verify that the ordinates of the standard normal curve at z = 1.27 and z = —1.27 are 
equal. 


Find the value z when the ordinate at z is 0.12001. 


4 


Solution. () The probability density function of standard normal random variable Z is 


(ii) 


(ui) 


. @&) 





(2) = eee??? 


—0oo < 7 < oo ae 


Since the standard normal probability density function is symmetric about zero, its 
maximum ordinate is at z = 0 


| ; 
OG = = 0.3989 
J 2T J 21 


Either calculating directly or using the Table 7, we have 
l 


Jon 
l 


 @(—1.27) = RES e~(-1.27)?/2 = 9.17810 


g(0) = 











9(1.27) = e-(1.27)?/2 = 0.17810 








21 
Note ( —1.27) = @( 1.27) 
Alternately, 
o(—1.27) = e~(-1.27)?/2 - e7(1.27)?/2 ee (— 1.27) 


2T J 27 
By the inverse use of Table 7 and the fact that o(—z) = (z), we have 
o(z) = 0.12001 


z = o-'(0.12001) = +£1.55 


Example 10.2 If Z is a standard normal random variable with mean 0 and variance 1, 


then find 

(i) P(Z < —-1.96) (ii) P(Z> 1.26) 

(iii) P(—1.96 < Z < 1.96) (iv) P(—o < Z < 2.12) 

(vy) P(-2.72 < Z<-) | 
Solution. From the definition of standard normal cumulative distribution function, we have 
@ PZ<-196) = &-1.96) 

: = 0.02500 ( From Table 9 ) 

(ii) 


P(Z > 1.26 ) = 1-P(Z < 1.26) 


= 1 — (1.26) = 1 — 0.89617 = 0.10383 





Normal Distribution 7 


<a mmm mmm a 


(iii)  P(-196 <Z< 1.96) = P(Z < 1.96) —- P(Z < -1.96) 

= (1.96) — O(-1.96) 

= 0:97500 — 0.02500 = 0.95 
(vi) P(-30 < Z< 212) = P(Z< 2.12) -P(Z<—-—). 

= ©(2.12) — D(-~) : 

= 0.98300 — 0 = 0.98300 { since D(-—o) = 0} 
(v) P(-2.72 <Z<0) = P(Z< «) = P(Z<—2:72) | 

= O(«) - 0(-2.72) 


= | — 0.00326 = 0.99674 { since @( + ) = 1 } 
Example 10.3. If Z is a standard normal random variable with mean Q and variance 1, 
then find | 
(i) P(Z < 1.282) (ii) P(|Z| < 1.64) 
(iii) P(|Z| > 2.37) (iv) P(Z < -1.64 or Z > 2.32) 
Solution. From the definition of standard normal cumulative distribution function, we have 
(i) P(Z < 1.282) = (1.282) ( By interpolating ) 
1.282 — 1.28 
= (1.28) + EET Ven — Di hy 
= 0.89973 + 0.2( 0.90147 — 0.89973) = 0.900078 
(ii) P(\|Z| < 164) = 20(1.64) - | {since P(|Z|< a) = 2@(a) - 1} 
= 2(0.94950) - 1 = 0.899 | | 
(iii)  P(|Z| > 2.37) = 2@(-2.37) { since P(|Z| > a) = 20(-a)} 


= 2( 0.00889) = 0.01778 
(iv) P(Z < —1.64 or Z > 2.32) = P(Z < -1.64) + P(Z > 2.32) 
= P(Z<-164)+1- P(Z < 2.32) 
= M(- 1.64) + 1 — 0(2.32) 
= 0.05050 + 1 — 0.98983 = 0.06067 
Example 10.4 If Z ~ N(0, 1), then find the value of a such that 


(i) P(Z>a) = 0.868, (ii) P(\Z| <a) = 0.90 
(iii) P(|Z| >a) = 0.238 (iv) P(Z <a) = 0.6198 
Solution. We have 
() P(Z>a) = 0868 
P(Z <a) = 1 — 0.868 = 0.132 
@(a) = 0.132 | 
a = ©7'(0.132) = -1.117 { From Table 10 (a) }. 


. 
: 
a gn cnenere = a a ae et EO TOLL LL ALLE LL LLL LA 
e 
e ri 7 ys -_ ‘ 


mi) Bl 


ewe wee ee ee ee oe 


Se onl 


8 Statistics — Part UL 





(ii) P(|\Z|< a) = 0.90 
2%(a)-1 = 0.90 {since P(|Z| <a) = 20(a) - 1} 
Mia) = 0.95 
a = ©-'(0.95) = 1.645 
@iij) P(\Z|>a) = 0.238 
20(-a) = 0.238 {since P(|Z| >a) = 20(-a)} 
®(-a) = 0.119 
-a = ®1(0.119) = -1.18 
a = 1.18 
(iv) P(Z <a) = 0.6198 
(a) = 0.6198 


-a = ©7'(0.6198 ) ( By interpolating ) 


= ©-1(0,619) + 0.8198 - 0.619 


®~'( 0.620) — &-'( 0.619 ) 
0.620 — 0.619 : 


= 0.3029 + 0.8 (0.3055 — 0.3029) = 0.30498 


: 10.2.6 Quantiles of Standard Normal Distribution. Let 0 < p < 1, then the p-th quantile 
or (100 p)-th percentile of the distribution of standard normal random variable Z is a value z ~ 
such that | 


P(ZS z,) = p 
M(z,) = p 


Zz, = &'(p) v 


P 


— 


Therefore, for the 95-th percentile z, 4. 
of standard normal random variable Z, 


Onn. z z 
WE BAYS Fig. 10.5 The p-th x f standard 
" ° € p-th quantile of standard - 
P(Z S %o5) = 0.95 normal distribution 


Zoos = P'(0.95) = 1.645 


Example 10.5 If Z is @ standard normal random variable, then find the lower and upper 
quartiles, the inter quartile range, the quartile deviation and the 70-th percentile of the 
distribution of Z. 


{ From Table 10 (a) } 





Normal Distribution — 9 

Solution. 

For the first quartile z).., we have 
P(ZS 2% 5) = 0.25 


D( Zy.25) = (0:25 
Zoos = P7!(0.25) 
Zp25 = — 0.6745 





{ From Table 10 (a) } Shas 
Fig 10.6 First quartile of standard 
normal distribution 


For the third quartile zo , we have 
P(Z < Zs) = 0.75 
P( 2975) = 9.75 
Zo75 = &'(0.75) 
29.75 = 0.6745 
{ From Table 10 (a) } 





<4 74 


Fig 10.7 Third quartile of standard 
normal distribution 


The inter quartile range ( /. Q. R ) and the quartile deviation ( Q. D ) are 
LQ.R = %75 — 225 = 0.6745 - (- 0.6745 ) = 1.349 
o.p = 2015 = %25 _ 0.6745 - (—0.6745) 
2 2 
For the 70-th percentile z) 4), we have 
P(Z S %) = 0.70 
P2979) = 9.70 
Zon = D5 ( 0:70) 
Zo.79 = 0.5244 
{ From Table 10 (a) } 


= 0.6745 





Fig 10.8 Seventieth erceatile of standard 
normal distribution 


Exercise 10.1 


1. (a) Define the normal probability density function and the normal cumulative distnibution 


function. Give the equation of the normal curve with mean p and standard deviation o. 


(6) Define the standard normal probability density function and the standard normal 
. cumulative distribution function. Give the equation of the normal curve with mean 0 
and standard deviation 1. 


2. (a) Find the ordinates of the standard normal curve at : 


Eqij 


Tow 


—e ee 


10 


(6) 


3. (a) 


(b) 


4. (a) 


(6) 


: (c) 





a 
| 5. (a) 


‘ Statistics — Part Il 
(i) =z = 0.64, (ii) z = 2.84, 
(iii) z = —0.84 (iv) z = —2.08 
(0.3251, 0.0071, 0.2897, 0.0459 ) ; 
Verify that the ordinates of the standard normal curve at z = 1.27 and z = - 1.27 are 


equal. Find the ordinate at z = 0. 
(0.1781, 0.1781, 0.3989 ) 


If the random variable Z has the standard normal distribution, find 


(i) P(Z< 1.46) (ii) P(Z > 2.58) 
(iii) P(Z < —148) (iv) PC Z > —-1.96) 
(vy) P(0.56 < Z < 1.99) (vi) PC(— 1.32 <'Z < 1.65) 


(0.92785, 0.00494, 0.06944, 0.97500, 0.2644, 0.8571) 
If Z ~ NCO, 1), verify that 
(NERC ZI 215) = PCZ > 2.15) 


(ii) P(Z < 1.86) = P(Z > -1.86) 
(0.01578, 0.01578, 0:96856, 0.96856 ) 


If Z ~ N( 0,1), find 


(i) P(Z> 1.645) (ii) P(Z < -—1.645 ) 
(iii) P(Z > 1.282) _ (iv) P(Z> 1.96 ) 
(vy) P(Z > 2.576) (vi) P(Z > 2.326) 
(vii) P(Z > 2.808 ) (viii) P(Z < —1.96) 


(0.05, 0.05, 0.0999, 0.025, 0.005, 0.01, 0.0025, 0.025 ) 
If Z ~ NCO, 1), find 


@ P(\Z|<1) (ii) P(\Z| < 1.96) 

(iii) P(\Z| < 3), (iv) P(|Z|> 2) 

(vy) - P(|Z| < 1.78) (vi) P({Z| < 1.645) 

(vii) P(\Z| > 2.326) (viti) P(Z < -1.97 or Z > 2.5) 


(0.6827, 0.95, 0.9973, 0.0456, 0.925, 0.9, 0.02, 0.03063 ) 
If Z ~ N(0, 1), show that 
‘ (i)  thecentral 95% of the distribution lies between + 1.96, i. ¢., 
P(—1.96 < Z < 1.96) = 0.95, 
(ii) the central 99% of the distribution lies between + 2.576, i. e., 
P(-2.576 < Z < 2.576) = 0.99. 
If Z~ N(O, 1), finda if | 
() P(Z<a) = 0.325 (ii) P(Z>a) = 0.025 
Gi) P(\Z\< a) = 09 iv)’ P(\Z| >a) = 0.097 
(—0.4538, 1.960, 1.645, 1.66) 


= 








maa) | (hill 


a 


(6) Inastandard normal distri bution, find 
(i) apointthat has 97.5% area below it, i. €., Z 975; 
(ii) a point that has 97.5% area above it, i. €., Zy 9s; 
(iii) two such points that contain central 90% areai. e., Zo.05 and 29.95" 
(1.96, -— 1.96, — 1.645, 1.645) 
6. (a) If Z ~ N(0, 1), find a if P(|Z| < a) takes the value (i) 80% (ii) 99%. 
=~ (1.282;, 2576) 
(6) If Z ~ NCO, 1), find a if P(|Z| > a) takes the value (i) 5% (ii) 2%. | 
(1.96, 2.326) . 


7. (a) Find the median, the lower and the upper quartiles, and the inter quartile range for a 
standard normal random variable Z 


(6) Inastandard normal distribution, 
(1) what is the value of mode, 
(ii) the areato the rightof z = 1 is 0.1587, what is the-area to the left of z = 1? 
(iii) find two points on z scale such that the area between them is 80%, 
(iv) find the area between — 1.5 and 2.5 on z ete 
(0; 0.8413; -— 1.28, 1.28; 0.9270) 


10.2.7 Use of the Standard Normal Tables for Any Normal Distribution. We now show 
how the tables of the standard normal random variable Z can be used for any normal 


random variable X where X ~ N( py, o ). ” i 
Theorem 10.5 If X ~ N( ul, O°), then | 


F(x) = o( +} 
0) 


Theorem 10.6 If X ~ N([l, 07) and a, b are any real numbers, then 


(i) ax <a) = 0 44) i | 
0) | 3 | 


. = —iee 
™ Ul" | kate oe 





(ii) P(X 2a) = | - of —#) 
oO 
(ip, Pla << 6) = of 4). of a 
: 7 = 


Theorem 10.7 if X ~ NCU, o*) and a is any real number, then i 
] a-—p i 
fl — ——as 2 
jas eebgl 


oC 








Example 10.6 If X is anormal random variable with p = 40 and o = 5, write down its 
probability density function. Find the ordinate of its normal curve at x = 42.5, Also find its 


maximum ordinate, 
Solution. We have 
Rd (se =H 
F(a) = Z of = 
l 42.5 — 40 
425) = — | 
pas) - 2 of 25-4) 
: ~ (05) = < (0.35207) (From Table 7) 


0.070414 

Alternately, The probability density function of the normal random variable X with parameters 

p = 40 and o = 5 is 

l 2 > 1 Pie. 8 
(x) = : e7(=-%) /2(5) ee e ~(2- 40) /50 
S * SVJ2n 5 J2n 
f( 425) = io (42.5 - 40)?/50 = 0.0704 
Tt 











The maximum ordinate of this normal curve is at x = pf = 40, which is 


l 40 — 40 
f(40) = +o 5 





3 = 90) = ~ (0.39894 ) ( From Table 7 ) 


= 0.079788 


Example 10.7 The scores made by candidates in a certain test are normally distributed with 
mean 500 and standard deviation 100. What percent of the candidates received scores 


(i) less than 400, (ii) more than 700, 

(i) between 400 and 600, (iv) which differ from mean by more than 150, 

(v) if a candidate gets a score of 680, what percent of the candidates have higher scores 
than he? 


” ip 


Solution. Let X be the score of a candidate, then 1 = 500 and o = 100. 
@ P(X < 400) = p( X= xi 


Sac: 100 
° = P(Z <—-1) = @-1) = 0.15866 = 15.87% 
(i) P(X > 700) = p( A=# > | 
Cc 100 
= P(Z>2) = 1-P(Z<2) 


1 - (2) = 1 - 0.97725 = 0.02275 = 2.28% : 








1 0) ESS 





Normal Distribution a | ms __ 15 








( 400-500 X-p 600-500 
(iii) P(400 < x < 600) = pf O—50 Bae gts H < | 


100 o 100 





= P(i-1<Z<1) = P(Z<1)-P(Z<-1) 
= (1) - @(-1) = 0.84134 — 0.15866 
= 0.68266 = 68.27% 
: Xi |= S150 
| P(X - eat) AS ie Sy a 
(iv) (|X -p| ) | 2 | 





= P(|Z| > 15) = 20(-15) 
= 2( 0.06681) = 0.13362 = 13.362% 


(vy). P(X > 680) = p( <4 > S| 
oO 100 
= P(Z>18) = 1-P(Z< 18) | 
= | — @(1.8) = 1 — 0.96407 = 0.03593 =. 3.59% 


Example 10.8 Given that the height of college boys is normally distributed with mean 5'-2" 
and standard deviation 4° and that the minimum height required for joining the N.C.C. is 
5’-4". Find the percentage of boys who would be rejected on account of their height. 


Solution, Let X be the height of a college boy, then pz = 5’-2” = 62 inches and o = 4 inches. 
The students with heights less than 5’-4” = 64 inches will be rejected for joining N.C.C. Then 


Pf XS ee ae 


P(X < 64 
( ) = ; 


P(Z < 05) = O(05) = 0.69146 = 69.15% 


Example 10.9 In a normal distribution with mean jt and standard deviation oO find 
P(u-o<s XSu+o). 


Solution. Let X be anormal random variable with mean p and standard deviation o, then 


Pi -oSX<Sp+o) - (42-4 2 Xe aide) 


Co C oO 
P(i-l1sZs1) 
(1) - &-1) 
0.68268 ‘ 
Example 10.10 If the diameters of ball bearings are normally distributed with mean 0.6140. 


P(Z<1)-P(ZS~-1) 
0.84134 — 0.15866 


inches and standard deviation 0.0025 inches. Determine the percentage of ball bearings with 


diameters : 


(i) less than 0.608 inches, — 3 (ii) greater than 0.617 inches, 
(iii) between 0.610 and 0.618 inches inclusive, (iv) equal to 0.615 inches. 





14 


Solution. Let X be the diameter of a ball bearing, then sz = 0.6140 inches and o = 0.0025 
inches, Considering the measurement errors, we apply continuity correction to the measurements. 


(i) The diameter smaller than 0.608 inches is in fact the diameter less than 0.6075 inches. 
Then 





P(X < 0.6075) pf <=! | 


< 
oO 0,0025 


P(Z < -2.6) = O(-2.6) = 0.00466 = 0.466% 
(ii) The diameter greater than 0.617 inches is in fact the diameter more than 0.6175 inches. 


Then 
- X-uU 0.6175 — 0.6140 
, = SS 
P(X > 0.6175 ) Pf z 0.0025 
= P(Z>14) = 1-P(Z< 14) = 1- O14) 


= |] — 091924 = 0.08076 = 8.076% 
(iii) The diameter between 0.610 and 0.618 inches inclusive is in fact the diameter between 
0.6095 and 0.6185 inches. Then 
P( 0.6095 — 0.6140 5 X-—p — 0.6185 — 0.6140 | 
0.0025 oO 0.0025 
P(-1.8<Z< 18) =P(Z< 18)- P(Z< -18) 
(1.8) - ®(-1.8) = 0.96407 — 0.03593 
= 0.92814 = 92.814% 


(iv) The diameter equal to 0.615 inches is in fact the diameter between 0.6145 and 0.6155 
inches. Then 


P(0.6095 < X < 0.6185) = 





P(0.6145 < X < 0.6155) 





ee 0.6140 E Ka 0.6155 — 0.6140 ) 
0.0025 Shey 0.0025 
P(0.2<Z<06) = P(Z < 06) — P(Z < 0.2) 
(0.6) — (0:2) = 0.72575 — 0.57926 


= 0.14649 = 14649% 
10.2.8 De-standa 


: rdizing. Sometimes it is required ; 
standardized value of Z. We use the relation Sos) to find a value of X that corresponds to the 


Z _- XH 
o = > = Uu+oaZ 
10.2.9 Quantiles of a No 


rmal Distributi. 
(100 p)-th percentile of the tas, af em bnem ei oe. then\the pth quantile or 


such that dard normal random variable Z is a value z 
CE Zp) = p 
Mz) = Pp 
a = 


®-(p) 





Normal Distribution 15 


This value z, of the p-th quantile or (100 p)-th percentile of the standard normal random 


variable Z = (X —y)/o can be de-standardized for determining the p-th quantile or 


(100 p)-th percentile x, of any normal random variable X with parameters fp and o by 


p 
the relation 





xX = Ut oZ 
Therefore 
2 ot fe +O Zp 
Example 10.11 If X ~ N( SO, 25 ), find the value of X which corresponds to a standardized 
value 
(i) —1.4, (ii) OQ, (iii) 1.6 
Solution. We have X ~ N(50, 25), then p = 50 ando* = 25 = of = 5.Then 
ta, eee => FX) =e D0 leo 
oO 5 
Putting the values (i) z = —1.4, (ii) z = 0, (iii) z = 1.6, we get 
(i) For z = — 1.4, we get 
x = 50+5z = 50+5(-14) = 43. 
(iz) For: iz; = :0; we get 
x = 50+ 5z =.50:+5(0) = 50 
(iii) For z = 1.6, weget 
x = 50+ 52 = 50+ 5(16) = 58 


Example 10.12 If X ~ N( 70, 25), find 
(i)  apoint thathas 87.9% of the distribution below it, 
(ii) apoint thathas 81.7% of the distribution above it, 
(iii) two such points between which the central 70% of the distribution lies. 


Solution. We have X ~ N(70, 25), then up = 70 ando*? = 25 > o =5 
(i) Let a be the point that has 87.9% area below it. Then 
P(X <a) = 87.9% = 0.879 


p( = ¥ a7 | 
: o 5 


p(z 28 — 0.879 
o —" = 0.879 


a-70 
5 


: ' 
: mt 
Heat 
hee 
\| ie? 3 
wr 
ie 
\ 





0.879 











®7!(0.879) = 1.17 { From Table 10 (a) } ? 


a) 
I 


70 + 5(1.17) = 75.85 a 








16 Statistics — Part I 


ee nn, eee 


(ii) Let a be the point with that has 81.7% area above it. Then 
P(X >a) = 81.7% = 0.817 
P(X <a) = 1-0.817 = 0.183 


p( =—# < ae = 0.183 
o 5 


P( z < a 
5 


of S = = 0.183 





0.183 





a= _ ©-1(0.183) = -0.904 


a 70 + 5(-0.904) = 65.48 
(wii) Let a, b be the two points between which 70% area lies. Then 
P(a < X < b) = 70% = 0.70 
But P(X < a)+P(a<X< b)+P(X> 6) = 1 ( Total probability ) 
P(X < a) +0.70+ P(X > b) = 1 


P(X < a) + P(X > b) = 1-0.70 = 0.30 


























By symmetry P(X < a) = P(X > b) = 030/2 = 0.15, therefore 
P(X >b) = 0.15 
P(X <a) = 0.15 (X<b) = 1-015 = 0.85 
p( *=# < 2-7) = 0.15 P( X-U < 25”) - 085 
oC 5 . o 5 
p(z <2) = 0.15 P( z < o- | - 0.85 
5 5 
of +=") - 0.15 of =") = 0.85 
5 5 
a=10 _ ©"(0.15) b= 105 = -1(0.85) 
5 5 
iciealOye® «71.9364 b= 10 _ 1.0364 
5 . 4 | 5 
a = 70 + 5(—1.0364) = 64.818 b = 70 + 5( 1.0364) = 75.182 


Example 10.13 If X ~ N( 24, 16), then find the 33-rd percentile. 


‘Solution. We have X ~ N( 24, 16), then p = 24 and o? = 16 > o = 4 








Normal Distribution 2 17 | 





For the 33-rd percentile x),, or P,;, 
we have | 


P(X < X33) = 90.33 


p( ==# 2 eS = 033 





o 4 
— 24 LLY Ns i LN 
P (z < ae = 0,33 
X33 
@| 703-24 | _ o32 . Fig 10.9 Thirty third percentile — 
4 pes of the given normal 
distribution 
~ 24 
eo = © (033) = -0.4399 


| Xpa3 = 24 + 4(-0.4399) = 22.24 
10.2.10 Finding the values of 4: or o or both. 
Example 10.14 If X ~ Nu, 25) and P(X > 69.6) = 0.017, find the value of the mean, 
Solution. Wehave X ~ N(p, 25), then o? = 25 => o = 5 
P(X > 69.6) = 0.017 
P(X < 696) = 1-0.017 = 0.983 


L 2 fi 
p( X=# < OSH) - oes es 1B 
o 5 i. 
P( z < 28-4 | = 0.983 
of S54) = 0.983 
5 
Soe = ©1( 0,983) = 2.120 


u-= 69.6 - 5(2.12) = 59 


Example 10.15 If X ~ N( 50, 07) and P(X < 60.6) = 0.983, find the value of the. 
standard deviation, OC. 


Solution. Wehave X ~ N(50, a7), then p = SO 


P(X < 60.6) = 0.983 | Fe 
xX - 60.6 — ly’ 
p( 2=* < oe | = 0.983 | EE 
oO Co iE 
p(z < 16) - oses 
Co 





18 


of 6 = 0.983 
a = m—! (0.983 ) = 2.120 
10.6 = 2.1200 = o = 5 


Example 10.16 In a normal distribution 33% of the values are under 48 and 12.3% are 
over 60. Find mean and standard deviation of the distribution. 
Solution. We have 


i 


P(X > 60) = 12.3% = 0.123 


P(X < 48) = 33% = 0.33 P(X < 60) = 1 - 0.123 = 0.877 

















(es =| = 0.33 | < at | = 0.877 
o o Co o | 
pze Sat |. 0.33 | 1 ei 0.877 © 
o o 
| = 0,33 sae = 0.877 
| oO. o 
— = ©-'(033) = -0.4399 . mat = ©-1(0,877) = 1.1601 
A 0.4399 viens ccccscscns recente (i) 60 -— uw = 1.1601 Cr tisecerenctessicccans Add) 
Subtracting (7) from (i), we get 
60- p= 1.1601 o 
48 — np = —0.43990 
- + = 
122° = léo0 =, go = 75 


Putting this value of o: in (ii), we have 


60 —pw = 1.1601(75) => [e=—513 


Example 10.17 if X is a normal random variable with parameters pt = 50 and o = 10. 
Find its mean, median, mode, lower and upper quartiles, quartile deviation, mean_deviation, 
variance, standard deviation, first four ‘moments about mean, moment ratios and moment 
coefficient of skewness. | Lott . | i 
Solution. Wehave uw = 50, o = 10 ! 

The mean, median, mode, lower and upper quartiles of the distribution are 

= : Mean = # = 50, Median: x5, = W = 50, Mode = HW = 50 
50 — 0.6745(10) = 43.255 

= 50 + 0.6745(10) = 56.745 


ll 


+025 Hu — 0.6745 o 


Li + 0.67450 


il 
Ml 


*0.75 


- i 











Normal Distribution : | Ae 





The quartile deviation, mean deviation, variance and standard deviation of the distribution are 


Q.D(X) = 0.67450 = 0.6745(10) = 6.745 
M. D(X) = 0.79796 = 0.7979(10) =. 7.979 
Var(X) =o? = (10) = 100 
S.D(X) =o = J100 = 10 


The first four moments about mean, moment ratios and moment coefficient of skewness of the 
distribution are 











Ht, = 0, Ll, = o- = (10) = 100: 

H, = 0, u, = 30% = 3(10) = 30000 
2 0)? Ll 30000 

Ba = = On be ee ee 
3 (100)° bs (100) 

co aS eee 





5 4 (100)? 


Example 10.18 In anormal distribution lower and upper quartiles are 28 and 55 respectively. 
Find mean and standard deviation of the normal distribution. 


Solution. We have Xoo5 = 28 and x), = 55. Then 


fee as * Fos. 9 ay PhS es 
2 2 

ae Mors ost 55-28 _ 20 
1.349 1.349 


10.2.11 Normal Distribution as a Limit of a Frequency Distribution of a Continuous 
Variable. A normal curve serves a good approximation not only for a histogram obtained from 
the binomial distribution but for many other histograms of observed frequency distributions of 
continuous random variables as well. Frequently a histogram of an observed frequency 
distribution with mean X and standard deviation s is well approximated for a large number of 
observations n = » f, by the normal curve whose equation is given by 


o, 





f(x) = eG Te 
sf 2m 


where fA is the common class interval in the grouped frequency distribution. The smaller the 
value of h, the better the approximation will be. 





Exercise 10.2 


1, @) If X is anormal random variable with 4 = 24 and o = 4, wnite down its probability 


density function. Find the ordinate of its normal curve at x = 21. Also find its 
maximum ordinate. 
(0.075285, 0.099735 ) 





Statistics — Part Il 





(6) 


(c) 


2. (a) 


(6) 


3. (a) 


(6) 


(c) 


Suppose that during periods of transcendental meditation, the reduction of a person’s 
oxygen consumption is a random variable have a normal distribution with f = 37.6c¢.c 
per minute and o = 4.6 c. c per minute. Find the probabilities that during a period of 
transcendental meditation a person’s oxygen will be reduced by 
(i) at most 35.0 c. c per minute, 
(ii) at least 44.5 c. c per minute, 


(iii) any where from 30.0 to 40.0 c. c. per minute. 
( 0.28604, 0.06681, 0.64900 ) 


Let X ~ N( 20, 25 ), find the area under the normal curve 
(i) below 30, 
(ti) above 30, 
(iii) between 30 and .42 

(0.97725, 0.02275, 0.02274) 


Suppose that it is know that IQ’s for adult Pakistanis are normally distributed with 
gt = 100 and o = 10. If an individual with an IQ of 130 or above is classified genius, 
what is the probability that a random selection yields a gcnius? 

(0.00135 ) 


The mean inside diameter of a sample of 250 washers produced by a machine is 5.05 
mm and the standard deviation is 0.05 mm. The purpose for which these washers are 
intended allows a maximum tolerance in the diameter of 4.95 mm to. 5.10 mm, 
otherwise the washers are considered defective. Determine the percentage of defective 
washers produced by the machine assuming the diameters are normally distributed 


_ (18.14% ) 


The length of lite for a washing machine is approximately normally distributed, with a 


mean of 3.5 years and a standard deviation of 1.0 years. If this type of washing 


machine is guaranteed for 12 months, what percentage of the sales will require 
replacement? 


(0.62% ) 


Assume that the time x required for a a runner to run a mile is a normal random variable 


with parameters ff = 4 minutes 1 second and o = 2 seconds. What is the 
probability that this athlete will run the mile 


(i) inless than 4 minuteS, 


(ii) inmorethan 3 minutes 55 seconds. 
( 0.3085, 0.9987 ) 


Assume that the distance X that a particular athicte will be able to put a shot (on his 


- first try) is normally distributed with parameters p = SO feet and o = 5 feet. 
Compute the probability that he tosses it not less than 55 feet and the probability that 


his toss travels between 50 fect and 60 feet. 
(0.1587. 0.4773 ) 


wees is normally distributed with parameters pi and G7tind. the; area under. the Curve 
tween 


() (-o) and (p+ a), 


(i) Hand (4 + 20), 
(0.68268, 0.47725 ) 





Normal Distribution : | : me 21 


_ — SS eee ? : ; 


4. (a) 


(b) 


5. (a) 


(b) 


(c) 


6. (a) 


(6) 


(c) 


The heights of boys at a particular age follow a normal distribution with mean 150.3 cm 
and standard deviation 5.0 cm. Find the ahaa 2) that a = picked at pando from 
this age group has height 


(i) lessthan 153 cm, 
(fi) morethan 145 cm, 


(iit) between 146 cm and 152 cm. 
(0.67003, 0.83147, 0.5015 ) 


Suppose the weekly incomes X are normally’ distributed with mean 10.06 Rs. and 


variance 2.64 Rs. find the probability P(8 <= X S$ 12 i Assume that the incomes are 
recorded to the nearest rupee. 
( 0.87569 ) 


Find the value of. X which corresponds to a mandantized value of - 2.05 and 0.86 for 
each of the following distributions 


(i) X ~ N(62.3, 38), 
(ii) X ~ Nu, 07), 
(iii) X ~ Na, b). Bae) otic ; 
( 49.66, 67.60; p-2.050, 1 +0.860; a - 2055, 2 + 086,/0 ) 


If X ~ N( 100, 64), find the value of a such that P(X <a) = 0.95. 
(113.16) : 


If X ~ N( 60, 25 ), find the value of a such that P(X > a) = 0.837. 
( 55.09 ) : 
In a normal distribution p = 30 and o = 5. Find 
(ft)  apointthathas 15% area below it, 
(ii) appoint that has 28% area above it, 
(iii) two points containing middle 95% area, 
(24.8; 32.9: 20.2 and 39.8 ) 
The time required by a nurse to inject a shot of penicillin has been observed to be 
normally distributed, with a mean of 4 = 30 seconds and a standard deviation of 
o = 10 seconds. Find the ne 
(i) 10-th percentile, i. €., Xo.10; 
(ii) 90-th percentile, &. e., *0.50° 
(17,2 sec, 42.8 sec ) 


Scores on a national education achicyement test are normally distributed with = 500 
and o = 100. 


(i) |Whatis the 95-th percentile of this distribution, 
(ii) | What are the lower and upper quartiles of this distribution, 


(iit) If the university decides to accommodate the 40° percent’ of the stoionts with - 
the highest scores, what is the score that separaice the successful applicants : 


with unsuccessful? 
(664.5; 432.55, 567.45; 525.33 ) 


‘ +o a<% ‘ 

seat 19" ie 235 ee 
ie Al) Ti te 
= * Tass “ 





22 


Statistics — Part Il 


= : : 


7. (a) 


(5) 


8. (a) 
(b) 

9. (a) 
(6) 

10. (a) 


(6) 


11. (a) 


(2) 


x 


12. (a) 


The height a high jumper will clear, each time he jumps, is a normal random variable 
with mean 6 feet and standard deviation 2.4 inches. 


(i) | Whatis the greatest height he will jump with the orobability 0.95? 


(ii) | What is the height he will clear only 10% of the time? 
( 68.06 inches, 75 inches ) 


Suppose that the amount of vaccine required to immunize human beings against 
smallpox is normally distributed with = 0.250 ounce and o = 0.040 ounce. 
Increasing the dosage increase the chances of successful vaccination. What is the 


minimum dosage required to produce success in 99 percent of the cases. 
(0.34304 ounces) 


If X ~ N( 70, 25 ), find the value of a such that P(| X - 70| < a) = 0.8. Hence find 


the limits within which the central 80% of the distribution lies. 
(6.41, 63.59. 76.41 ) 


Bags of flour by a particular machine have masses which are normally distributed with 
mean 500 g and standard deviation 20 ¢. 2% of the bags are rejected for being 
underweight and 1% of the bags are rejected for being over weight. Between what 
range of values should the mass of a bag of flour lie is to accepted? 

( 458.92, 546.52 ) 


The lengths of items follow a normal distribution with mean jf cm and standard 


deviation 12 cm. It is known that 4.78% of the items have a length greater than 92 cm. 
Find the value of mean Ll. 


(72) 


The lengths of rods produce in a workshop follow a normal distribution with mean 
and variance 4. If 10% of the rods are less than 17.4 cm long. Find the probability that 
a rod chosen at random will be between 18 and 23 cm long. 

( 0.7725 ) 


Tea is soled in packages marked 750 g. The masses of the package are normally 
distributed with mean 760 g standard deviation o. What is the maximum value of the 


o if less than 1% of the packages are under weight? 
( 4.299 ) 


Suppose that the life in hours of an electric tube manufactured by a certain process is 


normally distributed with parameters jf = 160 hours and o hours. What is the 


maximum allowable value for o, if the life X of a tube is to have probability 0.80 of 
being between 120 and 200 hours? 
(31.21 ) 


Assume that we have a large number of students whose average weight is 150 /b and 
that the weights are normally distributed. If we know that 36.4% of the students have 


weights between 137 and 163 lb. What is the Ee deviation of the weights? 
( 27.47 ) 


In a normal distribution = 40 and P( 25 < X < 55) = 
P(20 < X < 60). 
(0.9545 ) 


In a normal distribution 31% of item are under 45 and 8% are over 64. Find the 


mean and standard deviation of the distribution. 
(50, 10) 


0.8662. Find 


(5) 


13. (a) 


(5) 


(c) 


14, (a) 


(5) 


15. (a) 


(5) 


16, (a) 


(b) 


Normal Distribution 23 


If X ~ Nu, o%) and P(X < 35) = = 0.20 and P(-35 <X <45) = ‘0.65. Find 


li and o. 
(39.5, 5.32) 


Assuming that the number of marks scored by a candidates is normally distributed, find 
the mean and the standard deviation, if the number of first class students (60% or more 
marks) is 25, the number of failed students (less than 30% marks) is 90 and the total 
number of candidates appearing for the examination is 450. 

( 40.37, 12.32) 

A marketing organization grades apples into three sizes, small (diameter less than 
60 mm), medium ( diameter between 60 and 80 mm ), and large ( diameter more than 
80 mm ). A certain grower finds that 61% of his crop falls into the small category, and 
14% into the large category. Assuming that the distribution of the diameter X of the 
apples is described by a normal probability density function, calculate the standard 
deviation and mean of his crop. 

( 24.97, 53.03 ) 

The maximum temperature on June, 1 in a certain ei has been recorded and 
observed as normally distributed over year. About 15% of the time, it has exceeded 30: 


C, and about 5% of the time, it has been less than 20 C: nee is the mean and 
variance of the data? 


( 26. 13 C, 13.91 C) 
A man cuts hazel twigs to make bean poles. He says that a stick is 240 cm alone In fact, 


the length of the stick follows a normal distribution and 10%: are of length 250 cmor . 


more while 55% have a length over 240 cm. Find the pectin that a stick picked at 
random Is less than 235 cm long. ; | 
(0.203 ) 


The 90-th percentile of a normal distribution i is 50 while the 15-th percentile is 25. 
(it) Find uw and o. 


(ii) | What is the value of 40-th percentile. 
(36.17, 10.79, 33.44) 


The masses of articles produced in a particular workshop are normally distributed with 
mean p and standard deviation o. The 5% of the articles have a mass greater than 
85 g and 10% have a mass less than 25 g. Find the value of ys and o, and find the 
range symmetrical about the mean, within which 75% of the masses lie. 7 

(51.3, 20.5, 26.72, 73.88 ) 


In a certain examination, the percentage of passes and distinctions were 80 and 10 
respectively. Estimate the average marks obtained by the candidates, the minimum pass 
and distinction marks being 40 and 75 respecuyely.: assume the distribution of marks 
to be normal. 

(53.87, 16.48 ) 


What is the importance of normal distribution in statistical eon? Describe its 
properties. 


Suppose that X is normally distributed with H = 25 and o = 5. Find 
(i) the lower and upper quartiles, 


————— 2 
s (=A 





24 


Statistics — Part Ul 


ee  ___._. | | 


(c) 


(d) 


17. (a) 


(d) 


(c) 


(ii) the median, 


(iii) the mean deviation. 
(21.6, 28.4, 25, 4) 


In a normal distribution, the lower and upper quartiles are respectively 8 and 17. Find 
mean and standard deviation of the normal distribution. 
(12.5, 6.67 ) 


The continuous random variable X is normally distributed with mean jp and standard 


deviation o. Given that P(X < 53) = 0.04 and P(X < 65) = 0.97. Find the inter- 
quartile range of the distribution. 
(4.46 ) 


The value of second moment about the mean in a normal distribution is 4. Find the third 
and the fourth moments about the mean in the distribution. 
(0, 48 ) 


Find the proportion of the area under the normal curve included between the limits 


uw +o, p + 20 and p + 30 where p and o denote the mean and the standard 
deviation. 


( 0.6827, 0.9545, 0.9973 ) 
If X is a random variable with distribution N( 12.5, 9) and the random variable 


Y = g( X) = 2X + 5S. Find P(-~ < Y S 21) andP(45 < YS ~). 
(0.06681, 0.00621 ) 


© 








- Exercise 10.3 
Objective Questions 
1. Fill in the blanks. 
(i) The normal distribution is a distribution that ranges 
from —oo [0 oo, (continuous) 


(ii) The value of the parameter o of a poral distribution is 








always ———_——. (positive) 
(iii) The normal distribution is a bell shaped ————— distribution. (symmetrical) 
(iv) If ¥~ N(50, 25), then o = —____. | (9) 
(v) The maximum ordinate of the standard normal curve is at 0) 
Z=——-. 
(vi) Ina standard normal distribution, if | | 
P(Z < 2575) = 0.975, then Zyg75 = ———. (1.96) 
(vit) The maximum ordinate of a normal curve is at X = (11) 
(viii) The total area under a normal curve is (unity) 
7 (tx) The of a normal distribution ih al to z=0 
in the standard normal distribution. 


(mean) 


Normal Distribution 25 


(x) Inanormal distribution, the mean, median and mode 





are ———__—__ | (equal) 
2. _ Fill in the blanks. | 
(i)  Inanormal distribution, ———— = p - 0.67450. (Q,) 
(i) | In anormal distribution, ———— = p + 0.67450. (Q;) 
(iii) In anormal distribution, QD = ————¢ (2/3) 
(iv) [nanormal distribution, M D = ———-¢, (4/5) 
(v) The limits p + 0.6745 o include ———— percent area 
under the normal curve. (50) 
(vi) The limits + o “include ———— percent area under the 
normal curve. (68.27) 
(vii) The limits 4 + 2 o include ———— percent area under the 
normal curve. (95.45) 
(viii) The limits Uu £ 3 o include ———— percent area under the 
normal curve. (99.73) 
(ix) Ina normal distribution, all odd ordered moments about mean 
a : (zero) 
(x) Inanormal distribution, f£, = 0 and B, = ———. (3) 
(xi) | The normal distribution is neither platykurtic nor leptokurtic . 
but ——_—_—. (mesokurtic) 
(xii) The points of inflexion of a normal curve are from 
mean. (equidistant) 
3. Mark off the statements as true or false. 
© Normal distribution has two parameters namely 1 and 07. (true) 
(i) If X is normally distributed with mean p and variance o 
then itis denoted by X ~ N( 4, O°). we) 
: true 
(iii) The standard normal distribution has mean 0 and variance I. oe 
(iv) The maximum ordinate of a standard normal curve is at Z = . (true) 
(v) The standard normal distribution is symmetrical about aa 
(vi) Ina standard normal distribution, if . (true) 
P(Z < 70025) = 0,025, then Zyqs = — 1.96. 
(vit) In a sta24@rd normal distribution, if, : (true) 
P(Z < “0.995) = 0.975, then 29.975 ~ 1.96. 
(viii) Ina st@°4*°d yorrnal distribution, if . (true) 
P(\Z| <9) = 0.95,then a = 196 - (false) | 


. = 0. 
(ix) The normal Curve has maximum ordinate at X 








_ Statistics — Part Il 


Mark off the following statements as false or true. 


() 


The shape of a normal distribution depends upon its parameters 


_ namely p and o. 


(ii) 
~ (iii) 


(iv) 
(v) 


(vi) 
(vii) 
(viii) 
(ix) 
(x) 
(xi) 


(xi) 
(xiii) 


(xtv) 


(xv) 
(xvi) 


The parameter o controls the relative flatness of the normal 
curve. : 


The normal distribution is a bell-shaped symmetrical 
distribution. 


In a normal distribution, the mean, median and mode are equal. 
In a normal distribution, 

Q,= yw - 0.6745.0 and Q,= yp + 0.6745 0. 
In a normal distribution, mean and variance are always equal. 
The expected value of a normal distribution is {1 
The standard deviation of a normal distribution is o. 
The quartile deviation of a normal distribution is 0.6745 o. 
The mean deviation of a normal distribution is 0.7979 o. 


The two points containing the middle 95.45% area under a 
normal curve are i! + o. 


In a normal distribution, all even ordered moments about mean 
are Zero. 


In a normal distribution, all odd ordered moments about mean 
are zero. 


In a normal distribution, 6, = 0 and f, = 3. 


The points of inflexion of the normal curve lie at p + 20. 


The normal curve gets closer and closer to the x-axis but never 
touches it. 


(true) 
(true) 


(true) 


(true) 


(true) 
(false) 
(true) 
(true) 
(true) 


(true) 


(false) 
(false) 


(true) 
(true) 
(false) 


(true) 





—1 








SAMPLING 
TECHNIQUES 
AND SAMPLING 
DISTRIBUTIONS 





11.1 POPULATION (OR UNIVERSE) 


A population is the totality of the observations made on all the objects (under 
investigation) possessing some common specific characteristics, which are of particular interest to 
researchers. 


The population is the aggregate of the elements and these elements are the basic units 
that comprise and define a population. The population must be defined in terms of 


(i) _ content, (ii) unit, (iii) extent, (iv) time 


For instance, the students of first year class at a given college, the characteristic to be 
investigated may be the score received by each student in a college entrance examination, in a 
given year. Populations may be finite or infinite. 


11.1.1 Finite Population. A population is said to be finite if it includes a limited number of 
elementary units (objects or observations). 


Examples of a finite population are: the heights of all the students enrolled at a college 
in a given year, the wages of all employees of a steel mill in a given year, the amount of money 
spent by each student in an engineering university in a given academic year, or the grading of 
items as defective and non-defective that are produced by an industry on a given day. 

11.1.2 Infinite Population. A population is said to be infinite if it consists of unlimited 


number of elementary units. At least hypothetically, there is no limit to the number of units it can 
include. 


Examples of an infinite population are: the weights at birth of all human beings, the 
results obtained by rolling of a die, the lifetimes of all the bulbs produced in a production process 
that operates indefinitely under given manufacturing conditions. 

112 SAMPLE 


A sample is a part of the population which is selected with the expectation that it will 
represent the characteristics of the population. 


Se 


11.2.1 Sampling. Sampling is a procedure of selecting a representative sample from a given 
population. zn 
11.2.2 Sample Survey versus Complete Enumeration. The collection of information from a 
part of the population is called making a sample survey. The collection of information from all 
elements in a population is called taking a census or making a complete enumeration. : 


27 





28 Statistics — Part II 





11.2.3 Purposes of Sampling. The two basic purposes of sampling are: 


(i) To obtain maximum information about the characteristics of the. population with 
minimum cost, time and effort. 


(ii) ‘To find the reliability of the estimates derived from the sample. 


11.2.4 Advantages of Sampling. Following are the main advantages of sampling over a 
complete census. 


(i) Time Saving: A sample survey involves lesser amount of time and energy than a 
complete enumeration both in the execution and the analysis of data. This is a vital 
consideration when the information is urgently needed as the results from a sample 
survey are more readily available. 


| (ii) Economic: A sample survey requires less expenses and labour as compared to a 
complete census because the cost of covering only a fraction will be lower than. that of 
covering the whole population. 


(iii) Accuracy: A sample survey provides the results which are almost as accurate as those 


obtained by complete census. A properly designed and carefully executed sample survey 
will provide even better results. 


(iv) Feasibility: Sometimes the data are obtained by tests that are destructive. For example, 
to know the average life of certain type of electric bulb, we shall take a sample of these 
bulbs and keep them on until they bum out. We cannot think of testing the whole lot. In 
testing blood of a patient we do not drain the entire blood out of him but examine just a 
few drops. Sampling may be the only means available for obtaining the desired 
information ‘when the population is infinite or inaccessible. In such cases complete 
enumeration would neither be physically possible nor practically feasible. 


Whatever be the merits of sampling, it cannot totally replace a complete census. A 
census is a record of a nation’s history and its importance has to be given due acknowledgement. 


11.2.5 Limitations of Sampling. If the basic facts of each and every unit in the population 
aré needed, census become indispensable. The sample will not meet such a requirement. 


For example, the list of income tax payers is prepared very carefully, the list of voters is 
prepared to include the name of each and every voter, or an inventory of all goods 2 and stocks Is 
necessary to know the total amount of stocks of a firm. 


113 SAMPLING DESIGN 


A sampling design is a procestire or pan for obtaining a sample from a given population 
prior to collecting any data. 


The collection of detailed information is known as survey. When a survey is carried out 
by a sampling design, it is called a sample survey. A sample survey should be properly planned 
and carefully executed in order to avoid inaccuracies. 


113.1 Sampling Units. Sampling units are those basic units of the population in terms of 
which the sample design is planned. 


The sampling units must be distinct and exhaustive, i. e., they must make up the whole 
population and they must not be overlapping. Sometimes the sampling unit is obvious, as in a 
population of students or in a population of light bulbs. Sometimes there is a choice of sampling 
unit. In sampling an seocaltural crop, the sampling unit Buh be a field, a farm or an area of land 


Sampling Techniques And Sampling Distributions | SUR nh 428, 


whose shape and area is at our disposal. In sampling a human population, the unit might be an 
individual person, the household or all the persons living in a block. 


11.3.2 Sampling Frame. A sampling frame is a complete list of the sampling units. 


For example, a complete list of all the students in a college on May 10, 1995, is the 
frame. A complete list of all households in a city is an other example of the frame. 


11.3.3 Types of Sampling Designs. Meaningfulness of estimates, obtained from a sample, 
depends upon the methods of selecting a sample. Broadly speaking there are two different 
sampling schemes. 

(()- Non-probability sampling 

(ii) Probability sampling 


114 | NON-PROBABILITY (NON-RANDOM) SAMPLING 

A non-probability sampling is a procedure in which we cannot assign to an element of 
population the probability of its being included in the sample. 

We often make inferences about the population from arbitrary and informal samples. A 
wheat dealer forms his opinion about a sackful of wheat by examining just a few grains. To say 
about the quality of rice cooked in a big pot, the cook takes only a spoonful of rice to taste and 
decide on its quality of cooking. Such arbitrary selections are frequently made in research, in 
biological and physical sciences. 


11.5 PROBABILITY (RANDOM) SAMPLING 


A probability sampling is a process in which the sample is selected in such a way that 
every element of a population has a known non-zero (not necessarily equal) probability of being 
included in the sample. 


The advantage of probability sampling is that it provides a measure of precision of the 
estimates, The underlying principle of a random sample is that personal factor is eliminated in the 
selection as the investigator does not exercise his discretion in the choice of items. No factor other 
than chance affects the likelihood of an item being included in or excluded from the sample. A 
random sample may be taken with or without replacement. 

11.5.1 Random Sampling With Replacement. Sampling is said to be with replacement from 
a population ( finite or infinite ) when the unit selected at random is returned to the population 
before the next unit is selected. The formal description of the sampling method is as follows. 


1. _Anobject is selected from the population in a way that gives all objects in the population 
an equal chance of being selected. 


Z. The characteristics level of the object selected is observed, and the object is strat to 
the population prior to any subsequent selection. 


3. For a sample of size n, steps (1) and (2) are performed n times 


Thus the number of units available for future drawing is not affected. The population 
remains the same and a sampling unit might be selected more than once. 


11.5.2 Random Sampling Without Replacement. Sampling is said to be without replacement, - 
when the sampling unit selected at random is not returned to the population, before the next unit — 
is selected. The formal description of the sampling method is as follows. 


1. __ The first object is selected from the population in a way et gives all objects in the 
population an equal chance of being selected. 


a 





ww Statistics — Part I 





2. The characteristics level of the object selected is observed, but the object is not returned 
to the population. 


3. An object is selected from the remaining objects in the population in a way that gives 
all the remaining objects an equal chance of being selected, and step (2) is repeated. 
For a sample of size n, step (3) is performed (m — 1 ) times. 


Thus the number of units remaining after each drawing will be reduced by one. In this 
case, a sampling unit selected once cannot be selected again for the sample because the selected 
unit is not replaced. 


11.6 SIMPLE RANDOM SAMPLING 


Sune random sampling is a procedure of selecting a sample of n units 
(n = 1, 2,---, N) from the population of N units in such a way that: 


(i) Every unit available for sampling has an equal probability of being drawn. 
(ii) | Every sample of size n has the same probability of being selected. 
A sample drawn by this procedure is called_a simple or unrestricted random sample. A 

sample containing n elements selected from a population consisting of N elements is called a 
sample of size n. The simple random sampling is used in the population which is essentially 
homogeneous in terms of some characteristics relevant to the enquiry. For small populations 
where the elements are easily identifiable and accessible, simple random sampling may be easy to 
apply. 


Theorem 11.1 If a simple random sample of size n is selected from a finite population of size 
N, then the number of all possible samples is given as 


No. of possible samples = N", if sampling is done with replacement 
No. of possible samples = "P., if sampling is done without replacement 


Proof. A sample of size n under simple random sampling ( with or without replacement ) 
consists of an ordered specification of n elements, namely 


( the first chosen, the second chosen, ---, the n-th chosen. ) 


Sampling With Replacement. If we use sampling with replacement, the number of units 
available for each drawing are N. 


The first unit of the sample can be selected in N different ways, the second unit of the sample 
can also be selected in N ways, the third unit of the sample can also be selected in N ways, and 
so on, the n-th unit of the sample can also be selected in N ways. 


Using the multiplicative principle, the number of all possible samples of size n, that could be 
selected from a finite population of size N, is 


No. of possible samples = N x N xX N X +:: ntimes 
seve 


A sample of n units constitutes only one arrangement, and there are N" possible arrangements 
of n units from a finite population of N units, Each of the N” possible ssuples is selected with 


’ the same probability 


oe ee Oe eee l i 
NieNs ON “ NXNxN x++*ntimes N" - 





|) GUMBERRRIN TERI 


Sampling Techniques And Sampling Distributions re 31 


Sampling Without Replacement. If we use sampling without replacement, the number of units 
remaining after each drawing will be reduced by one 


The first unit of the sample can be selected in N different ways, the second unit of the sample 
can be selected in ( N — 1) ways, the third unit of the sample can be selected in (N — 2) 
ways, and so on, the n-th unit of the sample can be selected in (N — mn + 1) ways. 


Using the multiplicative principle, the number of all possible samples of size n, that could be 
selected from a finite population of size N, is 


No. of possible samples = N(N-1)(N-2):::{N-(n-1)} 
N(N-—1)(N -2):::(N-—n +1) 
iy N(N -1):*:(N—n+1)(N —-n)--- (3)(2)Q) 
‘ (N =n) +--+ (3)(2)(1) 

N! 


S —— =P 
(N-n)! . 


A sample of n units constitutes only one arrangement, and there are “ P_ possible arrangements 


of n units from a ‘finite population of N units. Each of the “P possible samples is selected 
with the same probability 


ale gt er get ! 
N N-I N-n+l N(N -1):-:(N-n+1) <P 
11.6.1 Random Digits. A table of random digits consists of a sequence of digits designed to 
represent the result of a simple random sampling with replacement from a population of digits 0, 
1, 2,-+-, 9. Ina table of random digits each digit from 0 to 9 is called a random digit, each 
having the probability of occurrence of 1/10. Here random implies that all of these digits have 
the same probability of occurrence and the occurrence and non-occurrence of any digit is 
independent of the occurrence and non-occurrence of all other digits. Table 15 is such a table. 

In a-random digits table, random digits are normally combined to form numbers of more 
than one digit. For example, random digits taken in pairs will result in a set of 100 different 
numbers from 00 to 99, each having a probability of occurrence of 1/100 and each being 
independent of other numbers similarly formed. Likewise random digits taken in triples will 
result in a set of 1000 different numbers from 000 to 999, each having a probability of 
occurrence of 1/1000 and each being independent of other numbers similarly formed. Similarly, 


random digits taken in quadruples will result in 10000 different numbers from 0000 to 9999, 
each having a probability of occurrence of 1/ 10000, and each being independent of other 
numbers similarly formed. 
11.6.2 Selection of Simple Random Sample. A simple random sample can be selected by the 
following methods. 
@ Lottery Method. In this method, a distinct and different serial number from 1 to N is 
9 assigned to every unit of the population of N units and the number is recorded on a card 
or a slip of paper. All the numbered slips are then placed in a container, and they are 
thoroughly mixed. A blind selection is made of the number of slips required to constitute 
_ the desired size of the sample. The items corresponding to the slips drawn will constitute 
the random sample. The selection of items depends entirely on chance. Some lotteries 
use a rotating wheel in selecting tickets. The wheel has equal segments on its rim, one 





ss > 


32 Statistics — Part Il 


for each of the digits 0 through 9. N lottery tickets are numbered from 1 to N. 
Suppose the tickets have three-digit numbers. A ticket number then would be selected by 
spinning the wheel thrice and recording the digit which appears at the pointer each time 
the wheel stops. If the digit sequence is 534, then the ticket number 534 is selected. 
The lottery method becomes quite cumbersome to use as the size of population becomes 
large, then an alternative method of selection of a random sample is employed. 

(ii) Using Random Digits. In this method a distinct sampling number from 0 to (N —1) 
is assigned to every unit of the population of N units. A table of random digits is 
consulted with a randomly selected starting point in the table. The table is read in single 
digits, in groups of two, three or more according to the number of digits in the sampling 
number (N — 1) assigned to the last unit in the population. Any number greater than 
(N — 1) is discarded. A number appearing second time is also discarded if the 
sampling is without replacement. Continue the process of selecting the random digits or 
numbers until the desired sample size is reached. 


11.7 STRATIFIED SAMPLING 


If the elements in the population are not homogenous, then the population is divided into 
non-overlapping homogeneous subgroups, called strata, and sample is drawn separately from 
each stratum by simple random sampling. This sample is called stratified random sample. The 
process of dividing a heterogeneous population into homogeneous subgroups is called 
Stratification. 

The benefit of this method is that if non-overlapping homogeneous subgroups of the 
population can be identified, then only a relatively small number of observations are needed to 
ascertain the characteristics of each subgroup. Stratification is used also to improve sample 
estimates of population characteristics. Stratification is used: 

(i) ‘to provide an adequate sample for each stratum, 


(ii) because it can give more precise estimates of population characteristics than other types 
of samples. 


11.8 ERRORS 


11.8.1 True Value. By true value we mean the value that would be obtained if no errors were 
made in any way in obtaining the information or computing the characteristic of the population. 


True value of the population is possibly obtained only if the exact procedures are used 
for collecting the correct data, each and every element of the population has been covered and no 
mistake or even the slightest negligence has happened during the process of data collection and its 
analysis. It is usually regarded as an unknown constant. 


11.82 Accuracy. By accuracy we refer to the difference between the sample result and the 
true value. The smaller the difference, the greater will be the accuracy. Accuracy can be 
increased: : 

(ij) _ By elimination of technical errors. 

(ii) _ By increasing the sample size. 


11.83 Precision. By precision we refer to how closely we can reproduce, from a sample, the 


results which would be obtained if a complete count (census) was taken using the same method of 
measurement. 


11.8.4 Error. The difference between an estimated value and the population true value is 
called an error. Since a sample estimate is used to describe a characteristic of a population. A 
sample being only a part of a population cannot provide a perfect representation of the 


5 re 


Sampling Techniques And Sampling Distributions 33 


Sapa OS aeageeant! peste fem i GH may think as to how close will the 
: y it 1s seen that an estimate is rarely 

equal to the true value. There are two kinds of errors: 

(i) Sampling (random) errors 

(ii) | Non-sampling (non-random) errors 
11.8.5 Sampling Error. A sampling error is the difference between the value of a statistic 
obtained from an observed random sample and the value of corresponding population parameter 
being estimated. 

A sample may not provide a true representation of the population under study, simply 
because samples represent only a part of a population and thus depend on “ the luck of the draw ”, 
even if the sample survey is properly designed and well-implemented. Generally, let T be 
sample statistic used to estimate the population parameter 6, then the sampling error, denoted by 
E, is defined as 

E = T/—/@ 

The value of sampling error reveals the precision of the estimate. Smaller the sampling 

error, the greater will be the precision of the estimate. The sampling errors can be reduced: 

(i) By increasing the sample size. 

(ii) - By improving the sampling design. 

(iii) By using the supplementary information. 
11.8.6. Non-sampling Errors. The errors that are caused by sampling the wrong population of 
interest and by response bias, as well as those made by an investigator in collecting analysing and 
reporting the data, are all classified as non-sampling or non-random errors. These errors are 
present in a complete census as well as in a sample survey. 
11.8.7. Bias. Bias is the difference between the expected value of a statistic and the true value 
of the parameter being estimated. Let T be the sample statistic used to estimate the parameter 9, 
then the amount of bias is 


Bias = E(T) - 9 
The bias is positive if E(T) > 9 it is negative if E(T) < 8 and it is zero if 
E(T) = @. Bias is a systematic component of error which refers to the long-run tendency of the 
sample statistic to differ from the parameter in a particular direction. Bias is cumulative and 


increases with the increase in size of the sample. If proper methods of selection of units in a 
sample are not followed, the sample results will not be free from bias. 





Exercise 11.1 





1. (a) Explain the terms: Population; Sample; Sampling frame; Sampling unit. 
(b) Define suitable populations from which the following samples are selected: 


(i) | One thousand homes are called by telephone in the city of Karachi and asked to 
name the T.V programme that they are now Watching 
(ii) Avcoinis flipped 53 times and 32 heads are recorded. 


(iii) Two hundred pairs of a new type of combat boots w ility i 
é ere 
Vietnam and, on the average, lasted two months eee ey r 
{ @) Homes in Karachi-city having telephones and Ty 
(ii) An infinite number of tosses of a coin, 


_ (iii). Total production of a new type of combat boots dur; 
uring & particular period. } 





() 


2. (a) 


(b) 


{c) 


3. (a) 
(d) 


(c) 


4. (a) 


(6) 


(c) 


5. @ 


Statistics — Part I 


In each of the following situations, determine whether the sampling is done from a finite 
population or an infinite population and then define the population. 


(i) Acoinis tossed 20 times and 12 heads are recorded. 
(i) Ten employees of large manufacturing company are selected as representatives 
of labour to serve as a labour management committee. 
(iii) A sample of bulbs is selected periodically to determine the number of defective 
bulbs produced by a production unit. 
{iv) Acoints weighed 15 times to estimate its true weight. 


{ (@) Infinite—population is all the potential tosses of the coin, 
(ii) Finite—population is all the employees of the company, 
(iii) Infinite—population is all the bulbs produced by the production unit, 
(iv) Infinite—population is all the potential weights of the coin. } 


What is meant by sampling? Describe the advantages of sampling over complete 
enumeration. 


For each of the following reasons, give an example of a situation for which a census 
would be less desirable than a sample. In each case, explain why this is so, 


() Economy (ii) Timeliness 
(iit) Size of population (iv) __ Inaccessibility 
(vy) Accuracy (vi) Destructive observations? 


Distinguish between the following: 
(i) Population and sample, 
(ii) Sampling with and without replacement. 


Distinguish between probability and non-probability sampling, giving examples. 
Describe the advantages of using a probability sample. 


What do you understand by a simple random sample? By taking some artificial 
example, explain the method of drawing a simple random sample. 


Distinguish between the following: 
(i), Random sampling and simple random sampling. 
(ii) Simple random sampling and stratified random sampling. 
(ii) Sampling and non-sampling errors. 


Explain how would you select a random sample of 10 households from a list of 250 


households, by using a table of random digits. 


A poll i is to be conducted to determine the voting preference of the voters in a certain 
city. Design a sampling plan such that the sample would be representative of the 
population of all the voters. 

In a certain locality, there are 300 households. We wish to select a sample of 50 
households. How would you select this sample using Random Nutnbers? 


What is the difference betweén precision and accuracy of a result? Explain with some 


- examples. 


(6) 





What are two broad categories of errors in data collected by sample surveys? What are — 


© 


the methods for reducing sampling Stony 





) ULE RRE eee 


Sampling Techniques And Sampling Distributions 35 


11.9 SIMPLE RANDOM SAMPLING 
AND SAMPLING DISTRIBUTIONS 


As we have already mentioned that a random sample must be chosen in such a way that 
it is representative of the population about which we want to make inferences. A random sample 
of observations can be chosen in either of the two ways: with replacement or without 
replacement. 


11.10 SAMPLING DISTRIBUTION OF A STATISTIC 


11.10.1 Parameters. The numerical quantities, that describe probability distributions, are called 
parameters. Parameters are fixed constants that characterise a population. 


Parameters are usually denoted by Greek letters. Thus, 7 ( the probability of success in a 
binomial experiment ), and jz and o (the mean and standard deviation of a normal distribution ) 


are examples of parameters. 
Let x,, x,,°°*, x, be the N elements ofa population. A population value summarizes the 


values of some characteristic ( or characteristics ) for all N units of an entire population. It 
describes some feature of the distribution of the random variable ( or variables ) in the defined 
population. Let x,, j = 1,2, +--+, N, be the observed value of some random variable X for the 


j-th element in fic population, then some of the examples of the population parameters are: 





N 
Population total: t= X Lx, 
ix 
N 
x xj 
Population mean: w=- i 
N 
2 (x j— )? 
Population variance: C- >= = 
No.of elements with attribute A k 


Population proportion; © %_ =~ ———————_______. = —— 
Population size N 


11.10.2 Statistic. A statistic is a function, of the observations of a random sample, which does 
not contain any unknown parameter. 


We know that a number of simple random samples can be drawn from the same 
population and each sample gives a different value of the statistic that is used as an estimator of 
the population parameter. The sample statistic is a random variable having its own probability 
distribution. We intend to use a statistic to make inferences about the distribution of the 
population. A statistic is usually denoted by a small Latin letter (X, s, r ) to represent its 
value obtained from an actually observed sample. A statistic is denoted by a capital Latin letter 
( X , S, R) to represent its random nature. 


Let x,, x2,°°*, x, be the observed values of a random sample X,, Xo) X,, Of size n 


from a given population of N items. A sample value is an estimate calculated from the n 


rv nee 


GE, ee) | tees oo _- i ~- 


— o= 


36 Statistics — Part Il 


elements in the sample, Let x,, i = 1,2, ---, n be the i-th element in the sample, then the 
observed values of some of the sample statistics are 





Sample total: > X; 
ja 
2 X; 
Sample mean: a ic 
i 
2( x; — x)? | (x, —x)? 
Sample variance: QO ee Sa ele 


n n—| 
where ns? = (n — 1)S5? 


: No, of elements with attribute A x 
Sample proportion: p=e-eorereroror CaF 
Sample size n 


11.10.3 Sampling Distribution of a Statistic. The sampling distribution of statistic is the 


probability distribution of the statistic obtained from all possible samples of some specified size 
that can be drawn from a given population. 


11.10.4 Standard Error of a Statistic. The standard deviation of the sampling distribution of a 
Statistic is called the standard error of the statistic. 


11.11 SAMPLING DISTRIBUTIONS 
FROM GENERAL POPULATIONS 


We will now look at the most common situations where the Central Limit Theorem is 
used to specify approximate probability distributions for sample statistics where sampling is done 
from gencral ( non-normal ) populations. 


The particular sampling distributions we are interested in are those for: (i) the mean, 
(ii) the difference between two means, (iii) the proportion of successes, (iv) the. difference 
between two proportions. 


11.12 SAMPLING DISTRIBUTION 


OF THE SAMPLE MEAN, X ae 


The sampling distribution of the sample mean X is the probability distribution of the 
means of all possible simple random samples of 1 observations that can be drawn from a given 
population with mean ji and variance o?. 


11.12.1 Standard Error of X. The standard deviation of the sampling distribution of the 
sample mean X, denoted by Gz, is called the standard error of X. 


; To discuss the relationships between the population and the sampling distribution of the 
sample mean, the following symbols will be used. 


——————_————7~ 


Sampling Techniques And Sampling Distributions i Ste 


N = Population size n = Sample size 

HM = Population mean . Hy = Mean of the distribution of X 

o* = Population variance of = Variance of the distribution of X 

Oo = Population standard deviation Oz = Standard error of the distribution of X 


11.12.2 Properties of the Sampling Distribution of X. The properties of the sampling 
distribution of the sample mean are given by the following theorems: 


Theorem 11.2 The mean of the sampling distribution of X, denoted by Us, is equal to the 
mean of the sampled population, i. e., 


Hs = E(X) = 


This theorem holds regardless of the sample size n or whether sampling is conducted with or 
without replacement. 


Theorem 11.3 The variance of the sampling distribution of X is equal to the variance of the 
sampled population divided by the sample size, i. e., : 


oe: = Vat x)= 


it 


Pad 


where X is the mean of a random sample of size n from an infinite population (or sampling 
with replacement) with mean yt and finite variance o* . 


The standard error of X then becomes 


7 


eA = 1G 
a ee las 


However, if the value of o is unknown, it is replaced by the sample standard deviation S', the 
estimate of the standard error of X then becomes 








rx Ky 
= O- = 


xX xX 
| ho: 


where. § < eres 
n-—|l 


Theorem 11.4 The variance of the sampling distribution of X is 


ig? = 
Ha =a ore 


5 








mph 


oO 


where X is the mean of a random sample of size n drawn without replacement from a finite 
population of size N with mean | and variance oO”. The factor (N — n)/ (N —-1) is usualy | 
called as finite population correction (fp c ) for variance. 


= 





ee Satie 


The standard error of X then becomes 


o N=-n 


a 


However, if the value of o is unknown, it is replaced by the sample standard deviation S, the 
estimate of the standard error of X then becomes 


RY N-n 


x Vn \ N=-1 

















where, S = 


n—- | 


Theorem 11.5 If X is the mean of the random sample of size n drawn from a normal 
population with mean yi and variance a7 (known), the sampling distribution of X isanormal 
distribution with mean [1 and variance o7 / n regardless of the size of the sample (including 
sample size 1). The distribution of the standardized sampling errors 

. X=Ue  X-p 

Ge; af fn 

will be standard normal distribution. 

Theorem 11.6 (Central Limit Theorem). For a large sample size, the mean X of a random 
sample from a population with mean [Lt and finite variance o* has a sampling distribution that 
is approximately normal with mean 1 and variance o?/ n regardless of the probability 


distribution (shape) of the sampled population. The eee the sample size, the better will be the 


normal approximation to the sampling distribution of X. The distribution of the standardized 
sampling errors 


zp 





Ul 
oe on 
will approach the standard normal distribution as n tends to infinity. 
Example 11.1 A population consists of four children with ages 2, 4, 6 and 8. Take all 
possible simple random samples of size 2 with replacement. If X is the age of a child, find, 
(i) ‘the theoretical sampling distribution of X , the mean age of two children in a sample, 
(ii) the mean, variance and standard error of X ; 
(iii) the mean, variance and standard deviation of the population. 
Verify the results 


: a) = Oo 
(2) Hy = i (11) Oo=- = os (iii) oz = — 
) : le 
Solution. Population: 2, 4, 6, 8; Population size: N= 4; Sample size: n = 2 
Number of possible samples = Nx N = 4x4 = 16 _ 


All possible samples that can be drawn with replacement from our population, and their means 
are shown in the following tree diagram. 





Sampling Techniques And Sampling Distributions 








First Second — Sample values Sample sum ~ 23; 
draw draw TeX; ex ip ee 
52 aD 4 ee 
2 a4 zs 6) 3 
6 2, 6 3 4 
Met bt 2, 8 10 5 
| 2 4,2 6 3 
4 = ; es 8 4 
. 6 4.6 10 5 
8 4,8 12 6 
2 6, 2 8 4 
6 — 5 °3 10 5 
6 6, 6 12 6 
4 ik 4 7 
“a a2 10. 5 
g ; et 12 6 
- 6 8, 6 14 » 
8 | 8,8 16 8 
Fig. 11.1 A tree diagram showing all possible samples of size 2 drawn with 
replacement from a population of the 4 equiprobable values 2, 4, 6, 8 
The sampling distribution of sample mean X , its mean, variance and standard error are 
Value of X Number of occurrences Probability 
x f a(x) = flxf ¥ p(x) x? p(X) 
2 C1 1ennmnn 6 4/16 
3 2 2/16 6/16 18/16 
4 3 3/16 12/16 48/16 
5 4 4/16 20/16 100/16 
6 3 3/16 | 18/16 108/16 
7 2 2/16 14/16 98/16 - 
8 ! 1/16 8/16 _ 64/16 
Sums | Lf = 16 7 80/16 440/16 
My = HX) = LER) =— = 5 
o% = Van) = Ex? pz) - uz = “O _ (sy = 25 














40 | | Statistics — Part 


o. = \Var(X) = J25 = 158 


The mean, vanance and standard deviation of the population 


A. ) 








Lx, 2 
L = ee 0 = 
N + 
x 9 : 
o- = / ees) = 5 
N 4 


I 
Ol 
I 
MN 
tn 
Ted 
on 


o 


We are to verify that 





| o 
(i) z= (os. = —— Th ee 
He H Xx - ( ) ¥ Jn 

2.236 

5 = 5 25° = a 1.58 = 3 

2 AP 
2.5 = 2:5 1258 =. .1-58 


Example 11.2 A population consists of values 3, 6, and 9. Take all possible simple random 


samples of size 3 with replacement. Form the sampling distribution of sample mean X . Hence 
state and verify the relationship between 


(i) _ the mean of X and the population mean, 
(i) ‘the variance of X and the population variance, 
(iii) the standard error of X and the population standard deviation. 
Solution. Population: 3, 6, 9; Population size: N = 3; Sample size: n = 3 
Number of possible samples = NX NxWN = 3x3x3 = 27 


All possible samples that can be drawn with replacement from our population, and the sample 
means are shown in the following tree diagram. _ 








First Second " Third Sample values Lx; 5 Lx; 
draw draw draw x > kien 
shah &) 9 3 
3, 3,6 12 4 
3, 3,9 15 5 
ei 3, 6,3 12 4 
3 3, 6, 6 1S 5 
Bh6;9 Re 6 
SS. 15 5 
3,9, 6 18 6 
S595 9 21 7 











Sampling Techniques And Sampling Distributions 41 
pt CCR RIGUES ARG QIN DUE LLUS LCC MELO TES eae 








3) ee 12 4 
3 << Ge OSG 15 5 
ohare 68359 18 6 
3 6, 6, 3 15 5 
6 (| <a 6, 6, 6 18 6 
2 6, 6,9 eel 7 
3 6,9, 3 18 6 
9 6 6,9, 6 21 7 
? 6,9,9 24 8 
3 9, 3,3 15 5 
3 : 9, 3,6 18 6 
? 9,3,9 21 7 
; " SE 95655 18 6 
7 6 9, 6, 6 21 7 
z 9,6,9 24 8 
S 9,9,3 21 7 
9 6 
; 9,9, 6 24 8 
9,9,9 27 9 
Fig. 11.2 A tree diagram showing all possible samples of size 3 drawn 
with replacement from a population of 3 equiprobable values 3, 6, 9. | 
The sampling distribution of sample mean X , its mean, variance and standard error are 
Value of | Number of occurrences Probability 
x f p(x) = flXf X p(x) x? p(x) 
3 l 1/ 27 3/ 27- 9/27 
4 3 3/ 27 12/27 _ 48/27 
5 6 6/27 30/27 _ 150/27 
6 7 7/27 42/27 252/27 
7 6 6/ 27 42/27 294/ 27 
8 3 3/ 27 24/27 _. 192/27 
9 1 1/ 27 9/27 81/27 
Sum f= 27 I 162/ 27 ~ 1026/27 
= heli 162 
My = E(X) = LF pz) = —— = 6 
27 
Ss eae 1026 
OF = Vat X) = 2x? p@) = 2 = — eye 


Oz = \Var(X) = .f2 = 1.414 


—= - 
—_ aad 


SSS ee 


reer 


ee a 


- 
ZY TT = 








. 
ee "v 


Statistics — Part Il 


42 


ard deviation of population 


a 






The mean, variance and stand 

















CE een 3 
Lx; > 126 2 
, = = -—_— -— (6) = 6 
Cees 3 
CG = 6 = 2.4495 
We are to verify that 
oa o? oO 
(i) oo it) ot = — iL O- = 
Wy = H ( = = (it) OX : 
6 2. 
6 = 6 2} ae rata = 4495 
3 Is 
J) p91 1414 = 1.414 


Example 11.3 A random variable X has the following probability distribution. 





Ifa sample of size 2 is taken with replacement, obtain the sampling distribution of X. 
Determine the mean and variance of the sampling distribution. Find the mean and variance of the 
population. Discuss the results. © 

Solution. We have an infinite population. Since the sample is drawn at random with replacement 
from the infinite population, the sample values are independent. Thus the distribution of possible 
samples of size n = 2 drawn with replacement is 


Sample values Sample total Sample mean Probability 

x; Dix; x = 2 p(x ) 

4,4 8 4.0 (0.3 )( 0.3) = 0.09 

4,5 9 4.5 (0.3)(0.5) = 0.15 

4,6 10 5.0 (0.3) (0.2) = 0.06 

5,4 = .9 AG. (0.5)(0.3) = 0.15 

5,5 10 5.0 (0.5) (0.5) = 0.25 

5, 6 ll 5.5 (0.5) (0.2) = 0.10 

6,4 iia, 21 B10 5.0 (0.2) (0.3) = 0.06 

6,5 1 5.5 (0.2) (0.5) = 0.10 
12 6.0 (0.2)(0.2) = 0.04 


656... 
] 


Ne ne ae 








Sampling Techniques And Sampling Distributions 43 


The sampling distribution of sample mean X, its mean, variance and standard error are 


Value of Probability 

x p(x) x p(x) x* p(x) 
4.0 0.09 0.36 1.440 
4.5 0.15+0.15 = 0,30 1.35 6.075 
5.0 0.06 + 0.25 + 0.06 = 0.37 1.85 9.250 
5.5 | 0.10+0.10 = 0.20 1.10 6.050 
60 ° 0.04 0.24 1.440 
Sums | 4.9 24.255 
My = EX) = LX p(x) = 49 

o% = Var(X) = Lx? p(x) - mE = 24.255- (49) = 0.245 


Oo; = J Var(X) = J 0.245 = 0.495 


The mean, variance and standard deviation of population 





P(x; ) 0.3 0.5 0.2 
x; P(x;) 1.2 2.5 1.2 Lx; p(x;) = 49 
xi p(x; ) 4.8 12.5 7.2 Lx} p(x;) = 24.5 
m= E(X) = Xx; plx;) = 49 
o? = Var(X) = Xx? p(x;)- yw? = 245 - (49) = 0.49 
o = 0.49 = 0.7 
We are to verify that: 
2 
e as af 39) se ey, LO: 
@ Ue =H @) oF = a (ii) oF = TF 
49 = 49 0.245 = seed = 0.245 0.495 = a0? = 0.495 
2 lee 


Example 11.4 _ A population consists of value 3, 5, 7 and 9. Take all possible simple random 
samples of size 2 without replacement. Form the sampling distribution of sample mean X . Find 
the mean, variance and standard error of X . Find the mean, variance and standard deviation of 


the population. Verify that: 

N-n 

N-1 

Solution. Population: 3, 5, 7, 9; Population size: N = 4; Sample size: n = 2 
Number of possible samples = N(N-1) = 4(4-1) = 12 








ao? ( N—n a 
(i) ci (ii) oy - | al (iii) 5 aie 


n 


ent” ED 3 = 
bo - a - - 
a —_ 


«a 


yony 
i] 


_—Y) at 


44 Statistics — Part I 


All possible samples that can be drawn without replacement from our population, and their means 
are shown in the following tree diagram. 





First Second Sample values Dx, z 2 x; 
draw draw x; ie 
OS a ry 
<< 7 —_——._ 3,7 10 5 
| 9 ———_ 3,9 , 12 6 

3 ———— 5,3 | 8 4 
<< 7 ——— 5,7 | 12 6 
9 ——— 5,9 14 7 
3 7,3 10 5 
—<—— 5 7,5 12 6 
9 7,9 16 g 
3 9 12 6 
<< 5 9,5 14 7 
7 ———__ 9,7 16 | 8 


Fig. 11.3 A tree diagram showing all possible samples of size 2 drawn 
without replacement from a population of 4 equiprobable values 3, 5, 7, 9. 


The sampling distribution of sample mean X, its mean, variance and standard error are 


+atue of X Number of occurrences Probability 
Ae {Sie p(x) = f/Xf X p(X) X? p(x) 
‘ 4 2 2/12 8/12 32/12 
3 2 2/12 10/12 50/12 
6 4 4/12 24] 12 144/12 
7 2 2/12 14/12 98/12 
8 2 2/12 16/12 128/12’. 
SUMEa ews afi 12 1 72] 12 452/12 
My = E(X) = 2 p(x) = = = 6 
o% = Var(X) = & x? p(x) - we = ~~ (6¥ = 1.667 


Oz = me = {1.667 = 1291 


~—_ ~o i “4 
, 
A 4g i ee) - 
7 
. 
AT My hmy ~ ‘ « 
a - : 
‘ 7 
> 
- ° : 4 
. -. : 
> : 
‘ , 
J 
oe" 
_ 
’ 
- 
. 7 


Sampling Techniques And Sampling Distributions 45 





The mean, variance and standard deviation of population 








o = J5 = 2.236 


We are to verify that 

















ie o? N-n” “ oO N—-n 
I — = Oa o- = —_ oo 
@ Hy = u (ii) OF : No1 | (ii) Of = Jn \ No1 
6 = 6 [667m os |e 1.291 = 2.230 | tn 
2A [P; 41 

1.667 = 1.667 ~ 1.291129] 


Example 11.5 A population consists of values 0, 3, 6 and 9. Take all possible simple 
random samples of size 3 without replacement. Form the sampling distribution of sample mean 
X. Hence state and verify the relationship between 


(i) the mean of X and the population mean, 
(ii) the variance of X and the population variance, 
(iii) the standard error of X and the population standard deviation. 
Solution. Population: 0,3,6,9; Population size: N = 4; Sample size: n =.3 
Number of possible samples = N(N-—1)(N-—-2)=4(4-1)(4-2) = 24 


All possible samples that can be drawn without replacement from our population, and the sample 
means are shown in the following tree diagram. 








First Second Third Sample values be re 
draw draw : draw x; Dix} = n 
3 a 6 OSG 9 3 
soe 12 4 
Bs, 
9 0, 6,9 15 3 
) ae 3 ——— 059 eee 4 
6 eS 15 5 


, ipa 





at ‘  _ Statistics — Part I 
6 3, 0, 6 9 3 
) =e ee in 
3 ———ae O70 : : 
9 3, 6,9 18 6 
0 3, 9,0 12 4 
a 3/564 = 18 6 
3 6, 0, 3 9 3 
a 9 6, 0,9 15 5 
P 3 ee 0 6, 3,0 9 3 
| 9 6, 3,9 18 6 
j-==5 0 6, 9,0 15 5 
3 6, 9,3 18 6 
ee 3 9,0,3 12 4 
° 6 9,0, 6 15 5 
9 ; ae 0 9, 3,0 12 4 
6 9, 3,6 18 6 
(= a 0 9, 6, 0 15 5 
3 9, 6,3 18 6 


Fig. 11.4 A tree diagram showing all possible samples of size 3 drawn without 
replacement from a population of 4 equiprobable values 0, 3, 6, 9. 


The sampling distribution of sample mean X, its mean, variance and standard error are 





Value of X Number of occurrences Probability 
x f _ p= flXf ¥ P(®) x* PX) 
) 3 6 6/ 24 18/24 54/24 
| 4 6 6/24 24/24 96/24 
: 5 6 6/ 24 _ 30/24 150/ 24 
| 6 6 6/24 36/24 216/24 
Sums TFf= eee. = 108/.24 516/24. 
Hy = E(X) = Xx p(x) = MS 4.5 
24 
Oy = Var X) = D5? pz) - wy = =P - (45) = 125 
Oz = yVar(X) = J125 = 1.118 


ae ee 








Sampling Techniques And Sampling Distributions | ; 47 





The mean, variance and standard deviation of population | 























Lx; 126 
oF = —+ -y? = ——- (45) = 11.25 
ys See | 
o = J1125 = 3.3541 
We are to verify that 
. o? N-n oO Nain 
i u> = u o2 = lil o- = 
45 = 45 1.25 eee bE} = Seek a = 
Jeel| en , 13 


1.25 = 1.25 1.118 = 1.118 
Example 11.6 A random variable X has the following probability distribution. 





= ss 
ee 


If a simple random sample of 3 numbers is taken without replacement, obtain a sampling 
: distribution of the sample mean X. Find the mean, variance and standard error of X . 

Solution. We have an infinite population. The actual sampling distribution of X, the sample 
mean of three numbers taken without replacement, is impracticable because the population is 
infinite. Since the sample is drawn at random without replacement from the infinite population, 
the sample values become independent. Then the actual sampling distribution of X , the sample 
mean of three numbers taken without replacement, is impossible but it virtually becomes the 
sampling distribution of X the sample mean of three numbers taken with replacement. 


The population size N isinfinite and n = 3. Then the finite population correction 


N-n C N=n Co 
: Salas eNiee3 209 and | eg ght 
N —1 In N-1 [n 


The mean, variance and standard deviation of population 














P(x; ) 0.2 0.4 0.4 Dplx;) = | 
x; P(x;) ) | Dx, p(x) = 42 
xj PC *;) | Dx? p(x,) = 18.2 





am F 


48 Statistics — Part I 


w= EX) = Lxp(x) = 42 

o? = Var(X) = Dx? p(x) — pw? = 182-(42) = 0.56 
The mean, variance and standard error of X 

Hy = b= 42 


o- = /0.187 = 0.432 


Example 11.7 The weights of 1000 students of a college are normally distributed with mean 
68.5 kg and standard deviation 2.7 kg. If a simple random sample of 25 students is obtained 
from this population, find the expected mean and standard deviation of the sampling distribution 
of means if sampling were done (i) with replacement and (ii) without replacement. 


Solution. We have . 
Population mean: u = 68.5, Population standard deviation: @ = 2.7 
Population size: N = 1000, Sample size: n = 25 

(i) Sampling with replacement: 

fh = 68:5 kg. 


_~ 
>I 
iI 


(ii) Sampling without replacement: 
Hy = MH = 68.5 kg. 


o N-n 2.7 | 1000 — 25 








OF = =30:53 kg 


Jn \ N-1 Jos ¥ 1000-1 
Example 11.8 Given the population 1, 1, 1, 3, 4, 5, 6, 6, 6, and 7. 


(a) Find the mean and standard deviation for the sampling distribution of mean for a 
: sample of size 36 selected at random with replacement. 


(6b) Find the mean and standard deviation for the sampling distribution of mean for a 
sample of size 4 selected at random without replacement. 


Solution. The mean and standard deviation of the population are: 





LD x2 
Pip? = == (4)? = 2.236 





Sampling Techniques And Sampling Distributions 49 


(a) Sampling With Replacement, We have, sample size n = 36. The mean and standard 
error of X are 


(5) Sampling Without Replacement. We have, sample size n = 4. The mean and standard 
error of X are: | 


Hy = Lt = 4 
o |N-n _ 2.236 | 10-4 


Oy = a N-1 = eafam 10-1 = 0.913 











Exercise 11.2 


1. (a) How do you define a population and a sample? Differentiate between parameter and 
statistic. Why a parameter is said to be a constant and statistic a variable? 


(6) A labour union has 1000 members. A random sample of 50 members of the union 
gave an average age of 40 years. The average age of the members of the labour union 


was, therefore, estimated to be 40 years. A gepiplete enumeration of all the members 
indicated that the true mean age was 43 years. Answer the following: 


“(i) | Which figure is a parameter? 
(ii) | Which figure is a statistic? 

{ (i) Population size N = 1000, and population mean age pL = 43 years; 
(ii) Sample size n, and sample mean x = 40 years.} 


2. (a) What is meant by a sampling distribution and a standard error? Describe the properties 
of the sampling distribution of sample mean. 


(6) What is meant by standard error and what are its practical t uses? 


(c) What is the finite population correction factor? When is it appropriately used in 
sampling applications and when can it, without too great an undesirable consequence, be 
ignored? 

3. (a) A finite population consists of the numbers 2, 4, .6, 8,-10 and 12. Calculate the 
sample means for all possible random samples of size n = 2, that can be drawn from 
this population, with replacement. Assuming the ~ 36 possible samples equally likely, 
make the sampling distribution of sample means and find the mean and variance of this 
distribution. Calculate mean and variance of the population and verify that — 


o2 
n 


{= 7, 0° & 11.667," > ="7, o> = 5.833} 


1) ey ey 2 (ii) o 


1 (0 


‘erm ae) oT ed 


Be thee) ts 


(6) 


(c) 


4. (a) 


x 


(c) 


Statistics = Part II 


A finite population consists of the numbers 2, 4, 6, 6, 8 and 10. Calculate the sample 
means for all possible random samples of size n = 2, that can be drawn from this 
population, with replacement. Assuming the 36 possible samples equally likely, form 
the sampling distribution of sample means and find the mean and variance of this 
distribution. Calculate mean and variance of the population and verify that 


i oO 
@ -Hy = HU (i) Of; = 
yin 


(H=6, wy =6, o = 2.582, oy = 1.826) 





Draw all possible samples of size n = 3 with replacement from the population 3, 6, 9 
and 12. Assuming the 64 possible samples equally likely, form a sampling distribution 
of the sample means. Hence state and verify the relation between - 


(@) the mean of the sampling distribution of the sample mean and the population 
mean; 


(ii) _ the variance of the sampling distribution of the sample mean and the population 
variance. 
ro a = ype bess i ae = Fe er 
{H = 7.5. Me = 75, Uy =H O* = 11.25, oF = 3.75, oF = o [n} 
A finite population consists of the numbers 2, 4, 6, 6, 8 and 10. Calculate the sample 


means for all possible random samples of size nr 2, that can be drawn from this 
population, without replacement. Assuming the 30 possible samples equally likely, 


. make the Sampling distribution of sample mean. Find the mean and variance of this 


distribution, Calculate mean and-variance of the population and verify that 


(DOME See = ver O cee ee 


x fn \ N-1 








{H=6, Hy = 6, G = 2.582, oF = 1.633} 


A finite population consists of the values 6, 6, 9, 15 and 18. Calculate the sample 


means for all possible random samples of size n = 3, that can be drawn from this 
population, without replacement. Assuming the 60 possible samples equally likely, 


make the sampling distribution of sample mean and find the mean and variance of this 
distribution. Calculate mean and variance of the population and show that 


2 = 
@ uy = kt i) ea | 





n \ N-1 
(H = 108, o% = 23.76, wy = 108, -o% = 3.96) 


Find the mean fi and variance ¢* of the finite population 1, 4, 7 and 8. Take all 
possible samples of size 2, that can be drawn at random without replacement from this 
population. Assuming the 12 possible samples equally likely, make the sampling 
distribution of sample mean and find the mean and variance of this distribution. Verify 
that | 


(DOP EX) =e (ii) Var(X) = z c= | 








N-| 





5. (a) | 


(0) 


6. (a) 


(5) 


7. (a) 


(6) 


(c) 


Sampling Techniques And Sampling Distributions 7 SE 


where X is the random variable ‘the sample mean’, N ts the population size and n is 
the sample size. What happens as N > ©? ay 


a) 





{u = 5, o7 = 7.5, E(X) = 5, Var(X) = 


In an infinite population 4p = 50 and o7 = 250, find the mean and variance for the. 
distribution of X if: 


@) an = 25, . (i) xn = 100, (iii) mn = 1250 
me : = “a 2 * =e 8 Jz 3 a i 
((@) wy = 50, of = 10 (i) uz = 50, 9S = 25 (iii) UE = 50, OF = 0.2} 


A large number of samples of size 50 were selected at random from a normal 
population with mean p and variance o?. The mean and standard error of the sampling 
distribution of the sample mean were obtained 2500 and 4 respectively. Find the mean 
and variance of the population. 

(2500, 800 ) 


If the size of the simple random sample from an infinite population is 55, the variance 
of sample mean is 27, what must be the standard error of sample mean if n = 165? 
(o in Sy 

If the size of the simple random sample from an infinite population is 36 and the 
standard error of the mean is 2, what must the size of the sample become if the standard 


error is to be reduced to 1.2? 
(n = 100) 


The random variable X has the following probability distribution: 





Find the mean Uy, variance ot and standard error Oz of the mean X for a random 


sample of size 36. 
(My = 5.3, o% = 0.0225, of = 0.15) 


A random sample of 36 cases is drawn from’ a negatively skewed probability 
distribution with a mean of 2 and a standard deviation of 3. Find the mean and 
standard error of the of the sampling distribution of X. 

(My = 2, oO; = 05) 


A random sample of 100 is taken from a population with mean 30 and standard 
deviation 5. The probability distribution of the parent population is unknown, find the 


‘mean and standard error of the of the sampling distribution of X. 


4 


— i = = ® = 


a 
! 
c 


52 wails Statistics — Part Il 


11.13 SAMPLING DISTRIBUTION OF THE DIFFERENCE 
BETWEEN TWO SAMPLE MEANS, X, - ¥; 


The sampling distribution of the difference between two sample means x ay: 5 is the 
probability distribution of all possible differences between means Xx; and X, obtained from all 
possible independent simple randbm samples of n, and x, Bh vation that can be drawn from 


two given populations with means f1,, }1, and variances a7, o3 respectively. 

Often we wish to compare the means of two random vanables. The comparison is made 
on the basis of two independent random samples drawn from given populations. 

Suppose that two independent random samples of sizes n, and n, are drawn from 
populations with means. 4, and 1, and variances oj and o3, respectively. Let X, be the mean 
of sample of size n, from the population with mean p, and variance o?, then X, is a random 
variable that has its own probability distribution with mean j1, and variance o7/n, . Let X, be 
the mean of sample of size n, from the population with mean 1, and variance 03, then X, i 
a random variable that has its own probability distribution with mean iL, and variance o3/n, . 


Then the differences xX, -- X, can be obtained from all possible pairs of X, and X3. 
Consequently, the difference X,— X, between two sample means is.a random Weatie that has 


its. own probability distribution which is called the sampling distribution of the difference 
between two sample means. 


‘11.13.1 Properties of the Sampling Distribution of the Difference between Two Sample 


Means. The properties of the sampling distribution of the difference X, — X, between two 
sample means are given by the following theorems: : 
Theorem 11.7 The mean of the sampling distribution of (X ,— X3), denoted by u x,-x, ' 
equal to the difference between the population means, 1. e., 
ee aoe BOS -X,) = My ee 
This theorem holds regardicss of the sample sizes n, and n, or whether sampling is done with or 
without replacement. . 
Theorem 11.8 The variance of the. sampling distribution of (xX, _ X, ), denoted by 
oz X,-%" is equal to sum of the variances of the sampled populations divided by the respective 
sample sizes, 1. @., 
| let 9 6 1 La ere: ne 
Oy x, = VarX,- X,) = Fastin 
where X, and ee are means of two independent random samples of sizes n, and n, from 


pie populations ( or sampling with replacement ) with means pt, and pL, and finite variances 
2 and o2 respectively. 








Sampling Techniques And Sampling Distributions ; = | : OAK. 
eee eee eee eee = ' 


The standard error of (X, — X,) then becomes 


. a ee 
= Scanian o oO . 
| [v4 1s Beat 


However, if a7 and o3 are unknown, these are replaced by the sample ¥: variarices Six and S 55; 





the estimate of the standard error of ( X, _ xX, ) then becomes 





A a y Are — xe 2 
where $5 = —————__—__ and Sy = (iz 2) 
ne] n, — | 





Theorem 11.9 The variance of the sampling distribution of ( X = X >) Is. 


| = o? ( N—-n,- a7 —n,)- 
o% _x, = Var(x,-X,) = Be ee + SOFIA Wee ia 
Wise ny N,-1 ny N,-1- 





where X, and X, are the means of random samples of sizes n, and n, drawn without 
repacement from finite populations of sizes N, and N, with means Lh and Ht, and variances 


o? and oc, respectively, 


The standard error of (X i- xX ,) then becomes 


of { Nin, OF ‘No 7 Mp 


Theorem 11.10 if X, and X, are the means of random samples of n, and n _ observations 





Jrom two independent normal populations with means Hi. My and variances oj ; Oo; 
respectively, then the sampling distribution of the difference between sample means. X,- = x; is 
normal with mean and variance 


My e, ois 2 


2 2 
Me Se 
LSS ee a 
FS it ny No 


That is, the distribution of the random variable 


Wiese eee a te 
7 2 SA) SE ae] (OS Xe 
© -X, p 





is a standard normal distribution. - 


nm tre. in 


e 


oe SS = 


es = &.« 


- 2 > 


Example 11.9 Let X, represent the mean of a sample of size n, = 2 selected at random 
with replacement from a finite population consisting of values 7 and 9. Similarly, let X, 
represent the mean of a sample of size n, = 3 selected at random with replacement from 


another finite population consisting of values 3 and 6. Forma sampling distribution of the 
random variable (X, — X,). Verify that 


2 a2 
(i) Hy x, = H;- Hs, (i?) Oy x, = art 
Solution. We have 
Population |: —7, 9: N, = 2; n = 2 
Number of possible samples = N, x N, = 2x2 = 4 
_ Possible samples (CHET (7,9), (9,7). (9,9) 
Sample means X, 7, 8, 8, 9 
Population Il: 3, 6; N, = 2; n, = 3 
Number of possiblesamples = N,x N,x N, = 2x2x2 = 8 
Possible samples (378,303.36), (3.6.3), (63,3), 
: (235.0;:6)),401 00; 5,6); (6,6,3), (6, 6,6) 
Sample means <x, 3; 4, 4, 4, sh of, 5, 6 


All possible differences between sample means (X, — X,) are 





on 
nr ; 
tn 
f 


The sampling distribution of X, — X., its mean and variance are 


‘Value of Numberof Probability 
x, = x, occurrences 


¥,-X, f AX )=f[EF (%-%)p(H—-H) (H—-% PCR —%) 

1 1/32 ier) 325 1/32. 

2 5 5/32 10/32 | 20/ 32 

3 10 10/32 30/32 90/32 

4 10 - 10/32 40/32 160/32 

5 5 5/32 25/32 > 125/32 

oe ete 1/32 | 6/32 36/ 32 
SG = o2 0. s . 112/32 er 4a0/32 


BEE 


Lh 


Sampling Techniques And Sampling Distributions . 55 


= 112 
My x = E(X, - X,) = > (%, - %) p(X, — X>) = "32, = 3.5 


2 EE eee ie ea eS 
oy; - Xx, Var( X, - X,) = 2 (, — X>) pC x) — X>) = HE, x2 


BEDE OG i) 
32 


The mean and variance of population I are 





IE ts P 130 
tia 
rr Hy = 5 (8) 








We are to verify that 








a2. «2 
@. Hy _x, = Mi - Hs Gi) of -%,.= saris = 
: my 2 
35 = 8-45 . 123 = 1,23 
2 3 

3.5 = 3.5 | 1.25 = 125 


Example 11.10 Two independent random samples of sizes n, = 30 and. Ny = 50 are taken 
from two populations having means [, = 78 and Ht, = 75 and variances Criss 150 and’ 


o2 = 200. Let X, be the mean of the first random sample and X > be the i mean of the second 
random sample. Find the mean and standard error of X,- X,. 


Solution. We have [a= 1S eS, 


tw 


= 150, nm = 30 © 


‘HW, = 75, oo; = 200, n, = 50 


Nw 


EHMINT NE | : 


white | ae 


=—energpe get 


sumo) ee te A 


56 Statistics — Part I 


Then mean and standard error of X, — X, are. 
y = M,-H, = 78-75 = 3 


Oo -: =- = oT | OF = 150° | 200 = 3 
Hy = Xe ny Ny 30 a) 7 


2 Exercise 11.3 


1. (a) What is meant by the sampling distribution of the difference between two sample means. 
Describe the properties of the sampling distribution of the differences between two 
sample means. 

(b) Let X , Tepresent the mean of a sample of size n, = 2, selected with replacement from 
a finite population —2, 0, .2, and 4. Similarly, let X, represent the mean of a sample 
of size n, = 2, selected with replacement from the population — 1 and 1. 


(i) Assuming that the 64 possible differences X, — X, are equally likely to 
occur, construct the sampling distribution of X, — X,. 


2 2 
(ii) Verify that Eps rea py el wang 
| ASA? n, n, 
{H, = 1, pw = 9, Hy _x, = 1, of = 5, o3 = 1, oF x, = 3) 
31 (a) Let the variable X ,; represent the mean of random samples of size n, = 2, with 


replacement drawn from the finite population 3, 4, 5. Similarly, let X , Yepresent the 
means of random samples of size n2 = 3, with replacement, drawn from the population 
0, 3. Assuming that the 72 possible differences X, — X, are equally likely to occur, 
construct the sampling distribution of X, — X,. Show that 
i) 7 7 = lH nC. = — 
( roe es Hy My ( ) x, —%9 ny nN» 


(H, = 4, wp = 15, Hz _z = 25, of = 0.667, 0} = 2.25,0% _¢ = 1.083} 


_X, 
(6) Let the variable X, represent the means of random samples of size 2 without 
replacement, drawn from the finite population 5, 7, 9. Similarly, let X, represent the 
means of random sample of size 2, without replacement from another finite population 
4, 6, 8. Assuming that the 36 possible differences X , — X, are equally likely to occur, 
construct the sampling distribution of X, — X, and verify that 
@ Hy _x, = Hy — jb | 


; 2 2 
Gino paren ereectes (Niemi i)6 Og) No Re 


Sampling Techniques And Sampling Distributions i 57 





= = es 2 — 
(HW, =7,H, = 6, uz _y = 1, 0, = 2.667, 05 = 2.667 Oe = 1.333} 

3. (a) The television picture tubes of manufacturer A have a mean lifetime of 6.5 years anda 
standard deviation of 0.9 years, while those of manufacturer B have a mean lifetime of 
6.0 years and a standard deviation of 0.8 years. A random sample of size 36 tubes is 
selected from manufacturer A and its mean X, is calculated. An other random sample 
of size 49 tubes is selected from manufacturer B and its mean X, is calculated. Find 
the mean and standard error of the sampling distribution of the difference X, — Xe 
(hy 2 x, = 0.5, OX, as Xs = 0.1886 ) 


(6) Random samples of each size 100 are drawn from two independent probability 
distributions and their means X, and X, computed. If the means and standard 
deviations of the two populations are #, = 10, o, = 2, fb, = 8, Oo, = I, find the 
mean and standard error of the sampling distribution of the difference X, — X,. 

(Hy i Xs = pe oF Xp = O 2720) 


® 


11.14 SAMPLING DISTRIBUTION OF 
SAMPLE PROPORTION, P 


The sampling distribution of sample proportion P is the probability distribution of the 
proportions of successes obtained from all possible simple random samples of n observations 
that can be drawn from a Bernoulli population with proportion of successes 1. 


11.14.1 Population Proportion, The population proportion is defined as 


_ No.of elements with attribute A sk 


—= — as Bm 


Population size N 


where k is the number of elements in the population of size N that possess a certain 
characteristic. In many applications of sampling the characteristic of interest in the population 
elements is qualitative with two possible outcomes. Quite often, however, we are interested not in 
the number of successes but rather in the proportion of successes. id 


~ 11.14.2 Sample Statistics X and P. When the characteristic of interest is qualitative with two 
possible outcomes, a sample statistic of interest is the number of occurrences among the n 
sample observations consisting of the particular outcome reflected in the population proportion. 
‘ This number of occurrence is denoted by X. Another sample statistic is the sample proportion, 
denoted by P, which is defined as 


pac No.of elements with attribute A _  X 


Sample size n 


The observed value p = x/n of sample proportion P will serve as an estimate of 1. 
Obviously, the actual value we obtain for p will vary from sample to sample. So we ask, how 


good the estimate obtained will be. Are the values of P likely to be close to the true proportion. 


m in the population. To what extent will they vary from one sample to another. Now for our 
theoretical model, we define a population in which a given proportion m have a specific attribute 





in vipeivin 


= ‘ 
Pa MmetnwTn 


7 ED 18 eae 6 ee ete ee my ee ee a ee 


SO es OSes 


58 Statistics — Part Il 


A. We suppose that every unit in the population falls into one of the two categories-A and A. 
The notation is as follows 


Number of units in A in Proportion of units in A in 
Population Sample Population Sample 
k x 

k ? t=r-e_eO = oo 

‘ N * n 


The estimate of proportion of successes 1 in the population is the sample proportion p 
and the estimate of the total number of successes k in population is thus Np or Nx/n. 


11.14.3 Binomial Distribution as Sampling Distribution: Sampling Infinite Populations. If 
a simple random sample of size nm is selected from an infinite population (or with replacement 
from a finite population ) whose elements are characterised by some attribute to belong to one of 
the two mutually exclusive and exhaustive categories where one of these will be designated a 
“success” and the other will be designated a ‘failure’, then the exact sampling distribution of the 
proportion of successes P is a binomial distribution. 


11.14.4 Properties of Sampling Distribution of P. The properties of the sampling distribution 
of the sample proportion P are as follows: 


Mean and Variance. The mean and variance of the binomial sampling distribution of P for a 
simple random sample of size n from an infinite Bernoulli population (or with replacement from 
a finite Bernoulli population) are given in the following theorem. 


Theorem 11.11 If the population is infinite or the sampling is done with replacement, the 
sample proportion P has its mean and variance as 


Hp = E(P) = 


o% = Var(P) = x02 


where x is the probability of success and (1 — 1) is the probability of failure. The standard 
deviation (often called the standard error or a variability) is 


= J Var( P) RUSE), 


However, if the value of 7 is naira! it is replaced by sample proportion P, the estimate of 
the standard error of P then becomes 


A P(l—P 
jee 


Shape of Distribution. The sampling distribution of P is skewed to the night if m < 0.5, 
skewed to the left if m > 0.5 and symmetrical if x = 0.5. 


Normal approximation. As n tends to infinity, the distribution of P becomes approximately 
normal with mean 1 and variance n(1 — 1)/n. That is, the distribution of the random variable 


Z= Pup eb paleis Pare Rae | 
Op ¥n(l—n)/n 


approach the standard normal distribution as n approaches infinity. 


Sampling Techniques And Sampling Distributions 59 


Example 11.11 A population consists of 5 members. The marital status of each member is 
given below | 


[Marit sans [$M SMS 


where M and S stands for married and single respectively. Determine the proportion of 
married members in the population. Take all possible samples of two members with replacement 
from this population and find the proportion of married members in each sample. Form the 
sampling distribution of the sample proportion P and verify that 


m(1 — 1) 
| 


(ee pm (CD) ee = 


Solution, Population: 1, 2,3,4,5; Population size: N = 5; Sample size: n = 2 


The members with even serial numbers 2 and 4 are married while those with odd serial 
numbers 1, 3 and 5 are single. 


Number of married members in the population: k=2° 


Proportion of married members in the population: 2m = — = — = 04 


Number of possible samples = NX WN = 5X5 = 25 


All possible samples, the number of married members and the proportion of married members in 
each sample are given below. 


Members Number of Proportion of Members Number of Proportion of 
insample married members married members| insample married members married members 
x p =x/n x p=x/n 
il 0 0 35 0 0 
1, 2 1/2 3, 4 1 1/2 
153 0 0 3309 0 0 
1, 4 I 1/2 4,1 l 1/2 
ee 0 OQ 4,2 2 l 
2, I l 1/2 4,3 | 1/2 
ak Y) 2 l 4,4 2 l 
2, 3 I 1/2 4,5 1 1/2 
2, 4 2 l aya 0 0 
235 1 1/2 532 1 1/2. 
S501 0 0 Ses 0 0 
3, 2 1/2 5,4 1 ne if2 
. Continued 535 0 0 


ron a Ieee hh ; 


bel Aw 





60 | Statistics — Part II 


The sampling distribution of sample proportion P, its mean and variance are 





Value of P Number of occurrences Probability 
P | f f(p)=flZf oP f(p) oP fp) 
0 9 9/ 25 0 0 
1/2 12 12/25 6/25 3/ 25 
l 4 4/25 4/25 4/25 
Sum fan 25 10/25 7/25 
, | 10 . 
Mp = EP) = Lps(p) = og 7 
o} = VatP) = Dp? f(p) - ub =  - (04)? = 0.12 
7 7 : 5 m(1— 7) 
We are to verify that () Up = 7% (i) of = ——— 
: Hl 
0.4 = 04 ip) = MACE) 
2 
0.12 = 0.12 


Example 11.12 It is known that 3 % of the persons living in Gujranwala city are known to 
have a certain disease. Find the mean and standard error of sampling distribution of proportion 
of diseased persons in a random sample of 500 persons. 


Solution. We have proportion in the population m = 0.03 and the sample size n = 500. Let P 
be the random variable ‘the proportion of persons in the sample which are diseased’. Then, the 
mean and standard error of P are 


Hp = ™ = 0.03 
Op = 


m-n) _ | 0.03(1-003) _., 
| = elec mees = 0.00763 


11.14.55 Hypergeometric Distribution as Sampling Distribution: Sampling Finite 
Populations. When a simple random sample of size n is selected without replacement from a 
finite population whose elements are characterised by some attribute to belong to one of the two 
mutually exclusive and exhaustive categories where one of these will be designated a ‘success’ 
and the other will be designated a ‘failure’, then the exact sampling distribution of the proportion 
of successes P is a hypergeometric distribution. 

11.14.6 Properties of Sampling Distribution of P. The properties of the sampling distribution 
of the sample proportion P are as follows; 

Mean and Variance. The mean and variance of the hypergeometric sampling distribution of P 


for simple random sampling without replacement from a finite Bernoulli population are given in 
the following theorem. | , | 


Sampling Techniques And Sampling Distributions mak 61 


Theorem 11.12 If the population is finite and the sampling is done without replacement, the 
sample proportion P has its mean and variance as 


Hp = E(P) = 


2 Van P) nm(l—-m) | N—n 
=e an) n N-1l1 | 


where 1 is the probability of success and (1 — 1) is the probability of failure. The standard 
deviation (often called the standard error or sampling ee is 


However, if the value of ™ is unknown, it is replaced by sample proportion P, the estimate of 
the standard error of P then becomes 


. _ | PQ=-P) | N=n 
. n N -1 


Example 11.13 Draw all possible samples of size 2 at random without replacement from the 
population 1, 2, 3,- 4, 5. Find the proportion of even numbers in the samples. Form the 
sampling distribution of the sample proportion P and verify that 


; E cf rE mil—-m) {| N-—a” 
(i) Up T _ (il) op i; ae =" 


Solution. Population: 1, 2,3,4,5; Populationsize: N = 5; Samplesize: n = 2 


Number of even numbers in the population: k = 2 
Proportion of even numbers in the population: t™ = + a - = 04 
Number of possible samples = N(N-1) = 5(5-1) = 


All possible samples, the number of even numbers and the proportion of even numbers in each 
sample are given below. : 


Sample | Numberof  Proportionof | Sample | Numberof Proportion of 


values evennumbers even numbers values evennumbers even numbers 
z p = x/n ax p=x/n. 
1,2 i= 1/2: 3, 4 l 1/2 
3 0 0 o350 0 0 
1, 4 1/2 4,1 1 1/2 
[25 0 0 4,2 2 bse, 
21 1/2 AP 3S l 1/2 
2:53 2 Y2 0 4, 5 I 2. 
—62, 4 2 l ayy il = 0 0 
2.5 I 1/2 oh? I 1/2 
ayy JL 0 0 5 6) 0 0 
3,2 1/2 5,4 1 2 


Continued 





62" Statistics — Part Il 


The sampling distribution of sample proportion P, its mean and variance are 
NI Eh at ae rc eel 





Value of P Number of occurrences Probability 
jx Tt (pi f(p) = flZf ap f(P) p? f(P) 
0 6 6/ 20 0 : 0. 
1/2 12 12/ 20 6/ 20 3/ 20 
l 2 «2/20 2/20 2/20 
| 8 
Hp = ECP) = 2 pf(p) = 70 = 0.4 
" a 5 
GO; = Var(P) = > p* f(p) — ws = 0 — (0.4)? = 0.09 
We are to verify that 
_ lt 2 = nO) xt | 
@) Hp = © (it) op a ci WE =| 
0.4 = 04 0.12 = a =o) = = | 
0.09 = 0.09 


11.15 SAMPLING DISTRIBUTION OF THE DIFFERENCE 

BETWEEN TWO SAMPLE PROPORTIONS, P, —- P2 - 

The sampling distribution of the difference between two sample proportions P, — P, is 
the probability distribution of all possible differences between proportions P, and P, obtained 
from all possible independent simple random samples of n, and n, observations that can be 
drawn from two Bernoulli populations with population proportions of m, and 1, , respectively. 
Often we wish to compare the proportions of successes in two Bemoulli populations. We must 
use the sample proportions of successes as our basis of comparison. Obviously, the number of 
successes in both samples cannot be used alone as a means of evaluation. Specifically we require 
a probability model of the difference between two sample proportions. 

Suppose that two independent random samples of sizes n, and mn, are drawn from 
Bernoulli populations with population proportions of =, and 7, respectively. Let F, be the 
_ proportion of successes in sample of size n, from the population with population proportion 7, , 
then /, is a random variable that has its own probability distribution with mean ™, and variance 
m, (1— 1%, )/ m, - Let P, be the proportion of successes in sample of size n, from the population 
with population proportion m,, then P, is a random variable that has its own probability 
distribution with mean m, and variance m,(1— 7, )/ m2 . Then the difference P, - P, can be 
obtained from all possible pairs of P, and P,. Consequently, the difference P, — FP, between the 





= 
— —s >. Fo 
———- 


; On? 


Sampling Techniques And Sampling Distributions 63 


two sample proportions is a random variable that has its own probability distribution which is 
called the sampling distribution of the difference between two sample proportions. 


11.15.1 Properties of the Sampling Distribution of the Difference between Two Sample 
Proportions. The properties of the sampling distribution of the difference P, — P, between two 
sample proportions are given by the following theorems. . 

Theorem 11,13 The mean of the sampling distribution of P, — P,, denoted by P,P? is 


equal to the difference between the population proportions, i. €., 
Upp, = ECR -P,) = 0, - Tt, 
This theorem holds regardless of the sample sizes n, and n, or whether the sampling is 
done with or without replacement. 


Theorem 11.14 If the populations are infinite or the sampling is done with replacement, the 
difference between sample proportions P, — P, has its variance as 


mt, (1—-f, ) i mt, (1 — 1, ) 


ny n, 


oh», = VaR —P) = 


The standard errorof A — FP, becomes 





[VantP =P.) m(1-m)° 1,(1—2 
Op —P, = Var(P, — P,) = ROSIE | Bak 2) 


my My 


However, if the values of , and 7, are unknown, these are replaced by sample proportions P, 
and P,, the estimate of the standard error of P, — P, then becomes 





P, (1 — P. 

» eh) 
n Ny 
Theorem 11.15 /f the populations are finite and the sampling is done without replacement, the 
difference between sample proportions P, — P, has its variance as . 


+ 


Ob, - P, Var( P, — P,) 


n, Nita! Ny N,-1 


The standard error of P, — P, is 


Sg Gam)) a ee Ne 








se aiiad) aie ih) ee 





64 


Example 11.14 Let A répresent the Proportion of odd numbers in a sample of size n = 2 
oie at random with replacement from a finite population consisting of values 4 and 5. 
imilarly, let P, represent the proportion of odd numbers in a sample of size n. = 2 selected at 


random with replacement from another finite population consisting of values 2,3 and 6. Froma 
sampling distribution of the random variable (PF, — P,). Verify that 


Wacom © Tsicy | (CB) GFaae Sp SMe ies Ae 
a ee 
Solution. We have 
Population!; 4, 5; Nai=2: n= 2 
Number of odd numbers: k, =2 
Proportion of odd numbers: t, = ake = xb 
N, 2 
Number of possible samples: NEXEN = 2% 2 = 4 
Possible samples: 
(4,4) (455) ee C5; 4) (5,5) 
Sample proportion of odd numbers: p, 
0 1/2 1/2 1 
Population II: 2, 3, 6; N, = 3; n=2 
| Number of odd numbers: ko 
Proportion of odd numbers: Th = spe 
) Neu 013 
' Number of possible samples: Nox N, = 3X3 = 9 


Possible samples: 
(252) 253) (2, 6) (3,2) (3,3) (3,6) (6,2) (6,3) (6,6) 
Sample proportion of odd numbers: p, 
0 1/2 0 1/2 1 1/2 0 1/2 0 
All possible differences between sample proportions (P, — P,) are 


P2 
‘ se ai 2 . 1/2 ue Y2 0 ¥2 
3 RneETHISEe | A) SW SP a SP a 
1/2 1/2 1/2 1/2 l/ 2 0 ! ! 0 ~1/2 
Vem em ae 2 aye 2 | Oy Oaks 8 HW/2 


(Ee 





z ae 


Sampling Techniques And Sampling Distributions 


The sampling distribution of P, — P,, its mean and variance are 


Value of Number of Probability 
P—P, occurrences 
Pi- P2 f F(P\=P2=F/ZF (Py P2) FCP, = P2) 
ij] i 1/ 36 —1/ 36 
—1/2 6 6/ 36 —3/ 36 
0 13 13/ 36 0 
1/2 12 12/ 36 «6/36 
1 4 . 4/36 4/36 
Sum =f = 36 l 6/ 36 
Hp p, = ECR - y) = 2(7,- P2p, = P2) = = 


OF, -p, = VatP,-P) = X(p,- mY f(y, - Pm) - 


“195 a fel een 
ype a 6) | seai2 


We are to verify that 


_ () Hp, - p, = mt, — Xt, (ii) Op -p, _ 
» 7 
ae 
(gk 
As 2 Heels 172 2 
6 9583 py 
eae Ne, SAT 
6 6 12 wit 


aus 
6 


t,(l1—t 
_ mU=m) , 


(P\- P2) f(P\- Po) | 
1/ 36 
3/72 
0 
_ 3/36 
"4/36 
19/72 


ee Ee 


2 
Up _p, 


Ny , 


Example 11.15 The actual proportion of men who like a certain TV programme is 0.30 and ~ 
the corresponding proportion for women is 0.25. A questionnaire about this program is given to 
500 men and 500 women, and the individual responses are looked upon as the values of 
eee random variables having Bernoulli distributions with parameters %, = 0.30 and 


= 0.25, respectively. Find the mean and standard error of P- 
i sample proportions of successes. 


=)0):25: 


Solution. We have m, = 0.30, 1, n, = 500, = 
The mean and standard error of A — P, are . | 
Hp_p, = ™%—-T%. = 0.30-0.25 = 0.05 


P,, the difference between 


66 | Statistics — Part Il 





nm, (1 — 1, ) a 


ny n> 


at 0.30(1 — 0.30) + 0.25 (1 — 0.25) = 0.028 
500 500 


11.16 OTHER SAMPLING DISTRIBUTIONS 

We have considered the sampling distributions of sample mean, difference between 
sample means, sample proportions and difference between sample proportions. Other statistics 
such as sample median, sample variance and sample standard deviation have their own sampling 
distributions. There is different sampling distribution for each different statistic even though the 
statistics may be computed from the same sample. For a given statistic, the sampling distribution 
will vary for samples of different sizes. Thus, in a sampling distribution it is necessary to specify 
the population, the statistic and the size of the sample. A change in any of these specifications 
will result a different sampling distribution. 


11.17 SAMPLING DISTRIBUTION OF 

THE SAMPLE VARIANCE, S? 

The sampling distribution of sample variance S* is the probability distribution of the 
variances obtained from all possible simple random samples of m observations that can be drawn 
from a population with variance o°. 

_ The sampling distribution of sample variance has the property 


mt, (1—T,) 


i 


Or -P, 


n=-1L 


Hee — E(S?) = o? 


Example 11.16 A population. consists of five numbers 2, 4, 6, 8, and 10. Consider all 
possible samples of size 2 which can be drawn with replacement from this population. Form the 
sampling distribution of sample variance and verify that 


ar STN 5 
Solution. Population: 2, 4, 6, 8, 10; Population size: N = 5; Sample size: n = 2 
Number of possible samples = Nx N = 5x5 = 25 
All possible samples that can be drawn with replacement from our population are 


(2,2) (2,4) ° (2,6) (2,8) (2, 10) 
(4,2) (4,4) eG4, 0) 5 2 3 (4,8) . (4, 10) 
(6,2) ECG) (6,6) (6,8) (6, 10) 
(8,2) (8,4) (8,6) .. (8,8) (8, 10) 
(10,2) (10,4) pa@LO,.0)) es (10, 8 ) (10, 10) 
feet ea re Dy, Ye Shy. 2 
(®) All possible sample variances: s* = a. = ee when n = 2 
0 ] 4 9 16 
I 0 l 4 9 
4 l 0 l 4 
9 4 L. 0 l 
16 9 4 I 0 


i, 


The sampling distribution of sample variance S? and its mean are 


Value of §? Number of occurrences: _ Probability | | 
so f Ps*)=f[Xf s? p(s?) 
0 F 5/25 | 0 
1 8 8/25 be 8/25, 
4 6 6/ 25 24/25 
9 4 4/25 36/25 
16 2 2/25 32/25 


Wee See l 2s? p(s?) = 100/25 


= 100 


The mean and variance of the population are 








We are to verify that Ho = Wi 2£ 
n 
2) 
4.5 ===(8 
5 (8) 
4=4 





Exercise 11.4 


1. (a) A fair coin is tossed 5Q times and the number of heads recorded are 27. The proportion 


of heads was, therefore, estimated to be 0.54. Answer the following. 
(i) | Which figure is a parameter? 
(ii) Which figure is a statistic? 

{ (i) The probability of head in a single trial m = 0.5; 


(ti) Sample size n = 50, number of heads in the sample x = 30 and the proportion 


-Of heads in the sample p = x/n = 0.54. } 


(6) What is meant by the sampling distribution of sample cease Describe the properties 


’ of the sampling distribution of sample proportion. 


TE ET TT | 


_ — =e 





2. (a) 


(6) 


3. (a) 


(6) 


4. (a) 


(b) 


, sampling distribution of 7, — F, . Verify that 


Statistics — Part Il 


A finite population consists of the numbers 2, 3, 4, 5, 6 and 8. Find the proportion P 
of even numbers in all possible random samples of size nm = 2 that can be drawn with 
replacement from this population. Assuming the 36 possible samples equally likely, 
make the sampling distribution of sample proportions and find the mean and variance of 


this distribution. Verify that 
- HP) = x ey) Vetoes 


i" 


where P and 7% are sample and population proportions respectively. 

i 2/35 ple =12/3}0% = 1/9)} 

A population consists of N = 4 numbers 1, 3, 4 and 5. Find the proportion P of 
odd riumbers in all possible samples of size n = 3 that can be drawn without 
replacement from this population. Assuming the 24 possible samples equally likely, 
construct the sampling distribution of sample proportions and find the mean and 
variance of this distribution. Verify that 


() fp = % Ge ote ae) [ 4" | 


n —] 


where P and x are sample and population proportions respectively. 


{m= 3/4, Up = 3/4, of = 1/48} 


Suppose that 60 % of a city population favours public finding for a proposed 
recreational facility. If 150 persons are to be randomly selected and interviewed, what 
is the mean and standard error of the sample proportion favouring this issue. 

{Up = 0.60,.o, = 0.04} 


A small, professional society has N = 4500 members. The president has mailed 
= 400 questionnaires to a random sample of members asking whether they wish to 

affiliate with a large group. Assuming that the proportion of the entire membership 

favouring consolidation i is m= = 0.7, find the mean and standard error of the sample 

proportion P. “5 

{Hp = 0.7, o, = 0.022 } 


‘What is meant by the sampling distribution of the difference between two sample 
proportions? Describe the properties of the sampling distribution of difference between .- 


two sample proportions. Explain its usefulness in statistical inference. 
Let F, represent the proportion of odd numbers in a random sample of size n, = 3 


with replacement from a finite population consisting of values 4 and 5. Similarly, let 
P, represent the proportion of odd numbers in a random sample of size n, = 2 with 


replacement from another finite population consisting of values 2, 3 and 6. Assuming 
that the. 72 possible differences A — FP, are equally eely to occur, construct the 


, n> 


—@ Upp, = %—% Wii) Opp, = 


1. err rc 


>». eae _ 


| aia ee 


Sampling Techniques And Sampling Distributions 69 


5. (a) 


(5) 


6. (a) 


(5) 


7. (a) 


Let P represent the proportion of even numbers in a random sample of size n, = 2 
without replacement from a finite population consisting of values 4, 6 and 9. 
Similarly, let P, represent the proportion of even numbers in a random sample of size 


n, = 2 without replacement from another finite population consisting of values 2, 3 
and 5. Assuming that the 36 possible differences A, — P, are equally likely to occur, 
construct the sampling distribution of P, — P, . Verify that 


() Led a Ved x fern RAY Te, 
mC be) act & shy 
(ii) 5, oe ere) Se eed “| 
ny N,-1 Ny N,-1 
{ T, = 2/3, TN, = 1/3, Hp, - p, = 1/3, Oh -P,'= 1/9 } 


The percentage of families with a monthly income of Rs. 1,000 or-more in city. A and 
city B is 25 % and 20% respectively. If arandom sample of 100 families is selected 


from each of these two cities and the proportions of families earning Rs. 1,000 or more . 


in the two samples are compared, what is the mean and standard error of A — P,, the 


difference between the sample proportions? 
(Hp,-p, = 9.05, Op _p, = 0.059 } 


A finite population consists of five values 2, 4, 6, 8 and 10. Take all possible samples 
of size 2 which can be drawn with replacement from this population. Assuming the 25 
possible samples equally likely, construct the sampling distributions of’ sample means 
and sample variances and find the mean of these distributions: Calculate the mean and 
variance of the population and verify that 


n—-1 
o2 





(i) pp =H (i) Ho = 


= : x.-xy 
where x = Xt and 2 =o SOE 


n n 


{uw = 6,0? = 8, Hy = 6, Uo = 4} 


A finite population consists of five values 1, 3, 5, 7 and 9. Take all possible samples 
of size 2 which can be drawn with replacement from this population. Assuming the 25 
possible samples equally likely, construct the sampling distributions of sample means 
and sample variances and find the mean of these distributions. Calculate the mean and 
variance of the population. Discuss the results. 


| =] 
[w= 5, 07 = 8 we = 5, Hye = 4, Hz =H, He = — o? | 





A finite population consists of 5 values 1, 3, 5, 7 and 9. Take all possible sani of 
size 2 which can be drawn without replacement from this population. Assuming the 
20 possible samples equally likely, construct the sampling distributions of sample 


means and sample variances and find the mean of these cist butoes Calculate the mean. 





f 


— 


———————— 
————— Ll st 


(6) 


Statistics — Part II 


; ify that 
and variance of the population and va i : i he - 
O° Bee 0) oS Oa ae 
1 Ob 
= LX 2 2X; - A) 
where X = —- and S° = n 


(w= 5,07 =8 Hy =5, Uy = 5) 


Take all possible samples of 2 distinct values from the population 2, 4, 6, 8 and 10. 
Assuming the 20 possible samples equally likely, construct the sampling distributions 
of sample means and sample variances and find the mean of these distributions. 
Calculate the mean and variance of the population. Discuss the results. 

; J - © NGG 2) Se 


© 





Exercise 11.5 
Objective Questions 





1, Fill in the blanks. 


(1) .A——is the totality of the observations made on all 

the objects possessing some common specific 

characteristics. (population) 
(ii) | A ———— is a part of the population which is selected | 

with the expectation that it will represent the 


characteristics of the population. (sample) 
(iit) + —— is a procedure of selecting a representative 

sample from a given population. (Sampling) 
(iv) The descriptive measures of a population are called 

—_—. | (parameters) 
(vy) A descriptive measure on the sample observations is 

called ——_——_. : (statistic) 
(vi) A population is called ————— if it includes a limited 

number of sampling units.. (finite) 
(vii) A population is called ———— if it includes an | 

unlimited number of sampling units. (infinite) 
(viii) Sampling ————— is a complete list of the sampling 

units. _ (frame) 


(i) | A ——— sampling is a procedure in which we cannot 
- assign to an element of the population the probability of Z 

its being included in the sample. : .__ (non-probability) 

(x) A ———— sampling is'a process in which the sample is 
selected in such a way that every clement of a population 

has a known nonzero probability of being included in the . 

Ee ie it ane! - (probability) 
(xi) eater name of a probability sampling, 1s ————— 


eget (random) 


Sampling Techniques And Sampling Distributions Mosk: aA 


(xii) Random sampling provides reliable ————, (estimates) 
(xiii) The sampling is said to be ————— replacement when ~ . 

the unit selected at random is returned to the’ population 

before the next unit is selected. pf mene: (with) 
(xiv) The sampling is said to be ————— ‘replacement when — 

the unit selected at random is returned to the pyle 

before the next unit is selected. (without) 
(xv) A sample is usually selected by ————— replacement. . (without) 
(xvi) In sampling ————— replacement, a sampling unit can be aa 

selected more than once. (with) 

2. Fill in the blanks. | 

(i) In sampling ————— replacement, a sampling. unit 

cannot be selected more than once. (without) 

_ (ii) In sampling with replacement, a finite population becomes 

i. (infinite) 
(iii) random sampling is a procedure of ‘selecting a 

sample from the population in such a way that every unit 

available for sampling has an equal probability, of being 

selected. (simple) 
(iv) The sampling error decreases by increasing the sample . 

—_—.. . _ (size) 
(v) |The ————— errors may be present both in sample . ; 

survey and census. (non-sampling) 
(vi) The bias increases by increasing the sample —————-. . (size) 
(vit) A sample which is free from bias is called an ¢ : 

sample. (unbiased) 
(viii) errors may arise due to faulty sampling frames, 

non-responses and processing of data. (Non-sampling) 
(ix) errors can be controlled by the proper training . 

of the investigators and following up the non- responses . (Non-sampling) 
(x) The probability distribution of a sample Statistic is called 

—_— distribution of that statistic. (sampling) 
(xi) The standard deviation of sampling distribution of a 

sample statistic is called the ————— of that statistic. (standard error) 
(xii) The standard error can be reduced by pete gl the | | 

—_—.. ‘(sample size) 
(xiii) The number of all possible samples of size. n taken see 

replacement from a population of size N is —————. = iCNi): 
(xiv) The number of all possible samples of size n faven 

without replacement from a-population of size Nis (YP ) 

——— a 

3. Mark off the following statements as true or false. 

(i) A descriptive measure on the sample observations is called 

parameter and a descriptive measure of a population is 

called statistic. _ * (false) 
(ii) A sample statistic is a random variable whereas the 


parameter being estimated is constant. (true) 


“" OTL TTT oR ee TE I) ee 





(iii) 
(iv) 
(v) 
(vi) 
(vii) 
(viii) 
(ix) 
() 
(xi) 
(xii) 


Statistics — Part I 


A sample survey provides the results which are more 
accurate than those obtained from a census. 

A sample design is a procedure for obtaining a sample 
from a given population prior to collecting any data. 

More detailed information can be obtained in a sample 
survey as compared to a census. 

Sampling may be the only means available for obtaining 
the desired information if the population is infinite. 

If the data are obtained by tests that are destructive, then 
complete enumeration becomes essential. 

Every random sample is a simple random sample. 

In sampling with replacement, the sample size may be 
greater than the population size. 

In sampling without replacement, the sample size can be 
greater than population size. 

The number of units available for the next drawing does 
not change in a random sampling with replacement. 

In sampling without replacement, the number of units 
remaining after each drawing will be reduced by one. 


Mark off the following statements as true or false. 


@ 


(ii) 
(a) 
(iv) 
(vy) 
(vi) 
(vii) 
(viii) 
(ix) 
(x) 
(xi) 


The number of all possible samples of size nm taken 
without replacement from a population of size N is “C_ . 


In sampling without replacement, a sampling unit can be 
selected more than once. 


In sampling with replacement, the sample size may be 3 


greater than the population size. 
In sampling with replacement, a finite population becomes 
infinite. 


Non-sampling errors may be present both in sample survey - 


and census. 

The sampling error increases by increasing the sample size. 
Sampling and non-sampling errors are both controllable. 
The standard deviation of a sampling distribution of a 
Statistic is called the standard error of that statistic. 


Standard error is the difference of a statistic from the | 


parameter being estimated. | 

We can decrease both sampling error and standard error by 
increasing the sample size. 

The reliability of an estimate can be determined by its 
Standard error. 


(false) 
(true) 
(true) 
(true) 


(false) 
(false) 


(true) 
(false) 
(true) 


(true) 


(false) 


(false) 
(true) 
(true) 


(true) 
(false) 
(true) 


(true) 
(false) 
(true) 


(true) 


ESTIMATION 





12.1 STATISTICAL INFERENCE 


Statistical inference is a field concerned with drawing conclusions about distributions by 
using observed values of random variables which are governed by these distributions. 


Statistical inferences are the conclusions made about the unknown value of the 
parameter of a population using a limited information contained in an observed sample taken 
from it at random. The two most important types of statistical inferences are 

(i) Estimation of parameters 
(ii) Testing of hypotheses 


12.2. STATISTICAL ESTIMATION 
The statistical estimation is a procedure of making judgment about the unknown value 
of a population parameter by using the sample observations. 


Population parameters are estimated from sample data because it is impracticable to 
examine the entire population in order to make such an exact determination. Statistical estimation 
procedures provide estimates of population parameters with a desired degree of confidence. This 
degree of confidence can be controlled, in part, by the size of the sample (the larger the sample, 
the greater the accuracy of the estimate) and by the type of the estimate made. The statistical 
estimation of population parameters is further divided into two types 

(i) Point estimation 3 
(ii) Interval estimation 


12.3 POINT ESTIMATION OF A PARAMETER 


The object of point estimation is to obtain a single number from the sample that is 
intended for estimating the unknown true value of a population parameter. 
12.3.1 Point Estimator. A point estimator is a sample statistic that is used to estimate the 
unknown true value of a population parameter. 


An estimator is always a statistic which is both a function and random variable with a 
probability distribution. An estimator is denoted by a capital letter(e. g., T, U, --: ). 

Suppose that X,, X,,-°:-, X, is arandom sample from a population with probability 
mass function or probability density function f(x; @), then the estimator T intended to 
estimate @ is a function given by 

DE = g( X45 Xo ae) 


12.3.2 Point Estimate. A point estimate is a specific value of an estimator computed from the 
sample data after the sample has been obseryed. When a random sample becomes available from 


73 


eS ee —_ — 
t 


74 ast Statistics — Part Il 


the population and the estimator 7 is computed from the sample data, the numerical value 
obtained is an estimate of population parameter @ from the particular sample. An estimate is 
denoted by a small letter (e. g., 1, u,->* ). 


Suppose that x,, x,,°°*, x, is an observed random sample from a population with 


probability mass function or probability density function f(x; @), then the particular value of an 
estimator 7 intended to estimate @ is a given by 


| oS Ge Roeser ee, 
Example 12.1 A random sample selected from a normal population with mean pe and 
variance oO? gave the values 25, 31, 23, 33, 28, 36, 22, 26. Give the point estimators for LL 
and o? and find their point estimates. 
Solution. We have 











The point estimator of population mean [J is X = ah 2: Xx; 
nm oizl 
The point estimate of population mean p is x= Bi 25 eee 28 
i 
The point estimators of population-variance o* are 
» Al ; tT 
Saye CK, = X)?, Sas =). (XiioX)? 
m i=l n—1l is] 
The point estimates of population variance o? are 
— x) (x. -—x) | 
ee ee ois 2 2) NL gs 





n . 8 n—]l 8 — 1 


12.4 UNBIASEDNESS 


The distribution of an estimator should be centred in some sense at the value of the 
parameter to be estimated. Because expected value is a measure of the centre of a distribution, a 
reasonable requirement for an estimator 7 may be E( T) = @. This property is called as 
unbiasedness of the estimator 7. It refers to the desirability of the sampling distribution of an 
estimator being centred at the parameter to be estimated. . 

12.4.1 Unbiased Estimator. An estimator is unbiased if the mean of its sampling distribution 
is equal to the population parameter to be estimated. 

Let X,, X,,°-*, X, be a random sample from a distribution f(x; @ ). An estimator 
T = (X,, X,, °**, X,) is said to be unbiased for parameter 0 if 


ET) = 0 


Estimation | 75> 





12.4.2 Biased Estimator. An estimator T of a population parameter @ is said to be biased if: 
E(T) # @ ) 
12.4.3 Bias. If an estimator 7 of a population parameter @ is biased, the amount of its bias is 
Bias = E(T) - @ 


If TJ is an unbiased estimator, it will tend to give estimates nearer to @ and if 7 is a biased 
estimator, it will tend to give estimates far from 8, 


Example 12.2 A population consists of five numbers 2, 4, 6, 8, and 10. Consider all 
_ possible samples of size 2 which can be drawn with erica ne from this population. By 
forming the sampling distributions, show that 


(i) The sample variance S* = DCX: - xX) / n is a biased estimator of the population 
variance o7., . 
(ii) The sample variance S ge XX; =X)2 /(n —1) is an unbiased estimator of the 
population variance o7 . 
Solution. Population: 2, 4, 6,8, 10 Population size: N = 5 Sample size: n = 2 
The mean and variance of the population are 


| Ji. : 
100) | ||-:x78= 2204) 





3 s = 2 

| o > Ll a - (6) 

Number of possible samples = N xX N = 5X5 = 25 

All possible samples: 
(2,2) (2,4) (2,6) (2,8) (2,10) 
(4,2) (4,4) (4,6) (4,8) (4, 10) 
(6,2) (6,4) (6,6) (6,8) (6, 10) 
(8,2) (8,4) (8, 6) (8, 8) (8, 10) 
(10, 2) (10, 4) (10, 6) ’ (10, 8) (10, 10) 

X(x,-*X)? (x, —x,)? 


(i) All possible sample variances: s* = rer aa = aie a whence. 


16 


wo f= © 
Oo fb. © = 
— © — & 
o- ff 


76 Statistics — Part Uf 


The sampling distribution S? and its mean are 


Value of S? Number of occurrences Probability 

ss f p(s?) = f/ZF s? p(s?) 

0 5 5/ 25 0 
8 8/ 25 8/ 25 

4 6 6/25 24/25 

9 4 4/25 36/25 

16 2 2/25 32/25 

yf = 25 l > s* p(s?) = 100/25 

E(S?). = > s* p(s?) = = = 4 


Since 4 = E(S*) # o* = 8, therefore S? is a biased estimator of o?. 


" x(x, - x)? (x, —x,)? 


(i) _ All possible sample variances: $? “> when n = 2 
0 2 8 18 32 
0 2 8 18 
2 0 2 8 
18 8 52. 0 
32s 18 8 2 
The sampling distribution of S* and its mean are 
Value of $* Number of occurrences Probability 
Soe f p( $2) = IDS $? p( 3) 
0 5 5/ 25 0 
2 g 8/25 16/ 25 
8 6: 6/ 25 | 48/ 25 
18 4 4/ 25 72/ 25 
32 2 2/ 25 64/25 
Lf= 25 | I > 5? p(§*) = 200/25 
AS") = Yi p(s) = => = 8 


Since 8 = E(S’) = o* = 8, therefore S? is an unbiased estimator of o?. 





Estimation 17 





12.5.5 BEST ESTIMATOR | 

Let X,, X,,°°:, X, bearandom sample of size n from the distribution f (x ; @). 
Among the class U of all unbiased estimators T = g(X,,X,,°°:,X,,) for a given parameter 
0, the estimator 7” is said to be a best or minimum variance estimator if among the class U 
of all unbiased estimators, none has a smaller variance than 7”. 
12.5.1 Best Estimators of the Population Mean and Variance. Let X,, X,,---, X, bea 


random sample of size n from a population with unknown mean p and unknown variance o7, 
then the best estimators of uz and o? are | 
xX pb xX; a X( X; = xX - 


and S? = 
n | n-—- | 





respectively. 


Example 12.3. Obtain the best unbiased estimates of the population mean {1 and variance o7 
from which the following sample is drawn: 


n = 8, 2 3x,0= 120! D(x, — x)? =) 302 
Solution. The best estimate of the population mean p is the sample mean 
=x; 120 


x=-—= = 15 


n 8 


The best estimate of the population variance co? is the sample variance 


12.5.2 Best estimator of the population proportion. From a population which has unknown - 


proportion of successes 1m, we take a random sample of size n with X as the number of 
successes in the sample, then the best estimator of 7 is 


Example 12.4 A random sample of 50 children from a large school is chosen and the number 
who are left handed is noted. It is found that 6 are left handed. Obtain an unbiased estimate of 
the proportion of children in the school who are left handed. 


Solution. We have n = 50 = 6 
Sample propo i ion 


12.6 POOLED ESTIMATORS 
FROM TWO SAMPLES 


Estimates of the population mean, variance, proportion, efc., may be obtained by pooling 
observations from two random samples. 


i) : _ Statistics — Part I 


12.6.1 Pooled Estimator of Population Mean. Let X,,, X,,,°°*, X,., and Xj., Xo, 


my 


>++, X,_, be two random samples of sizes n, and n, from a population with unknown mean 
i, then the pooled estimator X, of p is 
ny we 


A= 
n, + ny n, +n, 


p 


where X , and xX. 9 are the unbiased estimators of pu, based on the first and the second sample, 
respectively. 

12.6.2 Pooled Estimator of Population Variance. Let X,,, X2,, °**, X,,, and Xj2> 
> yee Xn? be two random samples of sizes n, and n, from a population with unknown 
variance o*, then the pooled estimator S? of o? is 


ny — 2 fi> =r . ; : 
52 = 2 (Xa aia 2 Xi2 7%) _ (n, —1) S? + (m, - 1) S} 


P ch ao i nm +n, - 2 
where Si and Se are the unbiased estimators of o*, based on the first and the second sample, 
respectively. 


Example 125 Two samples of sizes 40 and 50, respectively, are taken from a population . 
with unknown mean [Land unknown variance oc? . 


I 


Sample I: on, = 40, Sfx, = 807, Lf x? = 16329 

Sample I: n, = 50, DXfx, = 977, Lfxz = 19177 
Using the data from the two samples, obtain the best estimates of and o*. 
Solution. The best estimates of 4s and o” are 


LIX + LS x, 807 + 977 


x, = ——_—_:«2= —————— = _19.82 
J nm +N, 40 + 50 : 
x,)? 07)2 
Lf(x, - %) = Lf x? - ofay = 16329 — oe = 47.775 
l 


- 2 2 
Lf, -%Y = Lfxe _ fay _ 19177 aera = 86.42 © 
ny 
Lf (x, — %)? + Ere X,)? 47.1715 + 86.42 


yd: 
gs? = Oo) nS = 1525 
£ nm +n, — 2 40 + 50 - 2 


— a ae —— 


= 


Estimation —_—_ | 79 


12.6.3 Pooled Estimator of Population Proportion. From a population which has unknown - 


proportion of successes 7, we take two random samples of sizes n, and n, with X, and X, as 
the number of successes in the respective sample, then the pooled estimator m% of 7 is 


- _ X +X, _ mh +mP, 


mn, + ny ny + ny 

where FP, and P, are the unbiased estimators of 1, based on the first and second sample, 
respectively. 

Example 12.6 A random sample of 600 people from a certain district were questioned and 
the results indicated that 30% used a particular product. In a second random sample of 300 


people, 96 used the product. Using the data from the two samples, find the best estimate of the 
proportion of people in the district who used the product. 


Solution, The best estimate of the population proportion is 


n, = 600 Pp, = 90.30 
n= 300 x, = 9% he = 
" Nn, 300 ~ 
2 _ MP + MP2 _ 6000.30) + 300(0.32) _ 9 ag7 é, 
nm + Nn; 600 + 300 
i Exercise 121 | 3 Jind 2s 


1. (a) Explain what is meant by statistical inference? 
(b) What is meant by estimation? Differentiate between estimator and estimate? 
2. (a) Specify the estimator and the estimate in each of the following: 
(?) Asample of 35 students gave an average height of 62 inches. 


(ii) A sample of 50 households having television sets showed that 85 percent of 
them liked a particular programme. 


(aii) A sample of 25 bolts produced by a company showed that 20 of them were 
_ according to specifications. 


(iv) A sample of 30 houses showed an average consumption of electricity as 
65 units. 
{ ({) Sample mean height xX; x = 62 inches. 
(ii) Sample proportion of households who liked particular programme P; p = 0.85. 
(iii) Sample number of bolts according to specification X; x = 20, or sample 
proportion of bolts according to specification P; p = 0.80. 
(iv) Sample mean consumption of electricity X ; x = 65 units } 


(b) Suppose I choose a random sample of three observations from a population and obtain 


the values 2,5, 3. From these values I estimate the centre of the population by ranking 
the observations and taking the middle one. What estimator aim I using and what is my 
estimate? 


{ Sample median X95 = X(caaina)s os = X((n+1/2) = 3}. 





3. (a) 


(6) 


4. (a) 


(6) 


(c) 


5. (a) 


(6) 


Statistics — Part Il 


Is an estimator a random variable? Why or why not? . 
( Yes. An estimator is a random variable having its own probability distribution. ) 


Why we call the standard deviation of a sample statistic as standard error of the statistic. 

{ In the context of estimation, the deviation of a sample statistic 7 from its target 
@ ( parameter to be estimated ) must be considered an error. So the standard deviation 
of a sample statistic is commonly called the standard error of the sample statistic. } 


What is meant by unbiasedness? Differentiate between an unbiased and a biased 
estimator. 


A finite population consists of the numbers 3, 5, 7 and 9. Take all possible samples of 
size 2~ which can be drawn with replacement form this population. By forming the 


sampling distributions of X and S* show that 
(i) ‘the sample mean Xe) X ;/n is an unbiased estimator of the population 
mean 
(it) the sample variance S$? = )(X; -— X)*/n is a biased estimator of the 
population variance o?. 


{@ pw = 6, E(X) = 6, E(X) = p, (ii) o? = 5, E(S”) = 2.5, E(S*) # 07} 


Draw all possible samples of size 3 taken without replacement from the population 
7, 10, 13 and 16. By forming the sampling distributions, show that both sample mean 
X and sample median X, 5 are unbiased estimators. 


A finite population consists of the numbers 1, 3, 5, 7 and 9.: Consider all possible 
samples of size two which can be drawn with replacement form this population. By 
forming the sampling distributions of X, S : and S* show that 
_@ __ the sample mean XG=) DX ;/n is an unbiased estimator of the population 
mean fl. _ 
(ti) the sample variance S$? = > ( X,- xX)? /n is a biased estimator of the 
population variance a7. 


(iii) the sample variance $7 = 2)(X,-—X) [Kn — 1) is an unbiased estimator of 
the population variance o?. 


{@ p = 5, E(X) = 5, F(X) = pw, (ii) o? = 8, E(S*) = 4, E(S”) # o? 
(iii) o? = 8, E(S”) = 8, E(S”) = 02} 


A finite population consists of the numbers 2, 3, 4, 5, 6 and 8. Find the proportion P , 
of even numbers in all possible random samples of size n = 2 that can be drawn with 
replacement from this population. By forming the eannline distribution of sample 
proportions show that sample proportion is an unbiased estimator of the population 
a Also verify the resation 


Yar) i= mRU-m) 


Estimation so s— SS. sole 


where P and 7 are sample and population proportions respectively. 
(x= 2/3; pp =:2/3,0%) =11/9)} 


(c) A finite population consists of the numbers 4, 5, 6 and 8. Find the proportion P of 
even numbers in all possible random samples of size n = 3 that can be drawn without 
replacement from this population. By forming the sampling distribution of sample 
proportions show that sample proportion is an unbiased estimator of the population 
proportion. Also verify the relation 


where P and 7 are sample and population proportions respectively. 
{a = 3/4, up = 3/4, 0% = 1/48} 


: 


12.7. INTERVAL ESTIMATION 


Interval estimation is a procedure of constructing an interval from a random sample, 
such that prior to sampling, it has a high specified probability of including the unknown true 
value of a population parameter. 


12.7.1 Need for Interval Estimation. Any point estimate has the limitation that it does not 
provide information about the precision of the estimate i. e., about the magnitude of error due to 
sampling. Often such information is essential for proper interpretation of the sample result. 


A point estimator, calculated from the sample data, provides a single number as an 
estimate of the parameter. This single number lies in the fore front even though a statement of 
accuracy in terms of the standard error is attached to it. A point estimator, however efficient it 
may be, cannot be expected to be exactly equal to the population parameter. Moreover, we cannot 
assess simply by looking at just only one value ( point estimator ) how close is the estimate to the 
unknown true value of the parameter being estimated. A point estimate by itself does not supply 
this information about its precision. 


An alternative approach to estimation is to extend the concept of error bound to produce 
an interval of values that is likely to include the unknown true value of the parameter. This is the 
concept underlying estimation by confidence intervals. 


12.7.2. Interval Estimate. An interval estimate is an interval calculated from a random sample, 
such that prior to sampling, it has a high specified probability of including the unknown true 
value of a population parameter. 

Let X,, X,,°°*, X, be a random sample from a population with unknown parameter @. 
A confidence interval for @ is an interval (Z , U ) computed from the sample observations 
Xo X,,°°*, X,,, such that prior to sampling, it includes the unknown true value of @ witha 
specified high probability. Let (1 — a) bea specified high probability and Z and U be 
functions of sample observations X,, X,,°°*-+, X, such that 


P(L<@<U) =1-@ for 0 <a< 1 


—— 


1) || Oe 


82 ee, eS | Statistics — Part Il 


Then the interval (2, U) is called a 100( 1-4 )% confidence interval for the 
parameter . 6, and the probability (| -—- a) is called the cenfidence coefficient or the level of 
confidence. Note that, ( 1—«) is the probability that the random interval (L, U ) includes the 
parameter @ and not the probability that @ lies in the interval (ZL, U). The end points L and 
U that bound the confidence interval, are called the Jower and upper confidence limits for the 
parameter @. These limits being the functions of sample observations are random variables, The 
width U — L of the confidence interval measures the precision of the estimate. The shorter the 
confidence interval, the more precise the estimate will be. The precision can be increased by 


(i) decreasing the standard error of the estimate ( i. e., increasing the sample size ). 
(ii) decreasing the confidence coefficient. 
12.7.3 Confidence Coefficient. 


_ Meaning of Confidence Coefficient.. From the defizition of a confidence interval, we know that, 
prior to selecting the random sample, the probability is 1 -- @ that the confidence interval we 
obtain will include the population parameter @. The particular confidence interval result will be 
either correct or incorrect, and we do not Know for certain which is the case. | 
Selecting the Confidence Coefficient, We should like the confidence interval to be very precise 
(1. €., very narrow ) and would Irke to be very confident that it includes @. Unfortunately, for any 
fixed sample size. the confidence coefficient can only be increased by increasing the width of the 
confidence interval. The confidence interval widens rapidly as the confidence coefficient gets 
near 100 percent. 

“The choice of 1 — q@ will vary from case to case, depending on how much risk of 
obtaining an incorrect interval can be taken. The numerical confidence coefficient ( e. g., 0.95 ) 
is often expressed as a percent ( ¢. g., 95% ). Confidence coefficients of 90, 95, 98, and 99 
percent are often used in practice. | 


12.8 CONFIDENCE INTERVAL 

FOR POPULATION MEAN, 

‘The interval (Z,U) isa 100(1 — &@)% confidence interval for the population mean 
ui if prior to sampling: 

; PRL<p<U)= 1-0 
y This definition simply states that a confidence interval with confidence coefficient 
1 — q is an interval estimate such that the probability is 1 — a that the calculated limits 

include yt for any random sample. In other words, in many repeated random samples of size n 
from a population, 100 ( 1 + a )% of the interval estimates will include jz and therefore will be 
correct and a % of the interval estimates will not include yp and therefore will be incorrect. The 


choice of method used in constructing a confidence interval for _j4 depends upon whether or not. 


the population i is normal, whether the population variance o* is known or unknown, and whether 
the sample size n is large or small. We discuss these different cases below. 


; 12.8.1 Normal Population, o* known. Suppose that a random sample X,, X.,°°', X, of 
size n is drawn from a normal population with unknown mean p and known variance a7. We 


Wish to construct a confidence interval which ts likely to include the true unknown value of the’ 


population mean fi with a degree of confidence 1 — o.. We know that the sampling distribution 


ret c= Fao Tl Bee be ee 


Estimation | 83 





of X will be normal with mean yp and variance o?/ n. Consequently, the distribution of the 
statistic 


Gp X=h | 

alia 
will be normal with mean O and variance 1. Then a two-sided 100(1 -—- a)% confidence 
interval for population mean jp Is given by 


— oO 


X—Z_a/2 Ti 


” 


If X is the mean of an observed random sample of size n taken from a normal population with 
unknown mean pt and known variance o?, thena 100(1 — a)% confidence interval for p is 
given by 





age oe eS 
fate 
n 





~ o = o 
of J n % x n 
. : ab o . 
This can be wnitten SS = 21-a/2 = 
y n : 
Note that, often the word central is omitted when considering confidence intervals, but it is 
assumed that a two-sided interval that is central, or symmetric about the mean is required. 


12.8.2 Interpretation of a Confidence Interval. A 100(1 — @)% confidence interval for 
Lt is , 





Xe—oz OS eX 422 ais 
~ £1-a/2 jn 1—a/2 Jn 
n n 
If we identify 
= o ae o 
n et Jn 


the probability statement implies that prior to sampling, the random interval (L,U) will include 
the parameter 4 with a probability 1 — a. Thatis 


= o = 
P Fag fe cb Eta Ee | = l1-@ 


It is to be emphasized that in this expression, Lt is constant and it is the end points which ‘are 
random variables. To better understand the meaning of a confidence statement, we perform 
repeated samplings from a normal distribution with mean jp and standard deviation o and a 
100 (1 — %)% confidence interval x + z,_ 4), o/ fn is computed from each random sample, 
approximately 100 (1 — a )% of the intervals derived would contain the true valueof pp. 


84 3 | A = Statistics — Part I 





Interval 

os 

3 

: as 
ae 
i =a 2 2\-a 2 

Fig. 12.1 Repeated forming confidence Fig. 12.2 Two-sided confidence 
intervals for interval for pu 


Figure 12.1 shows what would typically happen if a number of samples were drawn 
from the same population and a confidence interval for 2 were computed for each sample. The 
true value of p is indicated by a vertical line in the figure. Different confidence intervals, 
resulting from different random samples, are shown as horizontal line segments. Most of the 
confidence intervals would contain pz, but some of them would not contain yp. If a 95% 
confidence interval were calculated for each sample then in the long-run, 95% of the confidence 
intervals that were formed would contain p. That is not surprising, because the specified 
probability 0.95 represents the long-run relative frequency of these intérvals crossing the vertical 
line. 

Figure 12.2 represents that, before the sample is taken, the probability is 1 — a that 
_ the quantity (X - u)/ (a/ jn ) will fall in the shaded interval. The interval estimate 


Xt 9/2 9/1 will be correct (i. e., willinclude 1) if (X - u)/(o/,/n ) does fall in 


the shaded interval. In effect, the risk a of an incorrect confidence interval is divided equally in 
the two tails of the standard normal distribution. 


12.8.3 Steps to Follow When Forming a Confidence Interval. We will follow the following 
standard format when estimating parameters with confidence intervals: 


(i) _. Identify the population of interest, and state the conditions required for the validity of the 
procedure being used] to construct the confidence interval. 


(ii) Give the procedure ( formula ) that will be used 
(tii) Construct the confidence interval 
(iv) Interpret the results. 


Example 12.7 A normal population has a variance of 100..A random sample of size 16 
Selected-from the population has a mean of 52.5. Construct the 90% confidence interval 
estimate of the population mean, jt Interpret the result. — 


Solution. The size and mean of sample and the variance of normal population are 
n= 16, x = 52:5, a= 10 > of= 110; 


Estimation | ae 85° 


Confidence coefficient:. 1-—-a = 0.90 | 
l=-a@ =/090. 33/0 =10)0 (5750/2 =10:05 i Sel) 2 = 0:95 
Zj-a/2 = Zo95 = 1.645 { From Table 10 (4) } 


The two-sided 90% confidence interval for j is 


= o -. Oo 

LO < w < 52.5 + (1.645) —— at 
y 16 a 
48.4 < pp < 56.6 


A two-sided 90% confidence interval for 2 obtained from the observed amp is (48 4, 56. 6). 
We are 90% confident that the interval estimate contains p. 


Example 12.8 Unoccupied seats on flights cause the airlines to lose revenue. Suppose a large 








52.5 — (1.645 ) 


airline obtained the 90% confidence interval for the average number of unoccupied seats per — 


flight, on the basis of the records of its randomly selected 225 flights over the past year, as 


11.15 to 12.05, Find the value of X, the mean of the sample and o the standard deviation of 
the normal population from which the sample was drawn. Estimate the average number of 


unoccupied seats per flight over the past year with 99% confidence coefficient. 
Solution. Sample size n = 225 


Confidence coefficient! 1 - a = 0.90 a 
l[-a0 = 090 = a= 010 = a/2 = 005 = 1-a/2 = 095 
Z1-a/2 = Zo95 = 1.645 { From Table 10 (5) } 


The two-sided 100( 1 — a )% confidence interval for y is 


= oO = o 
See eel <Us< eT aie 


The lower and upper limits of 90% confidence interval for ps are 11.15 and 12.05, Thus - 








4-9/2 = 1115 > x- (1.645) —— = 11.15 ......(i) 
¥ + 2 a2 a = 12.05 = € + (1.645) a = 12.05<n2n()ie 


Adding (i) and (ii), we get 
20) 25.20 ats xX = 11.6: 
Putting the value of xX in (ii), we get 


oe = 12.05: . Leo 
25 ee 


11.6 + (1.645 ) 





4.1 


ae <7. me 


86 7 Statistics — Part Ul 





Confidence coefficient: 1-—a = 0.90 
l-a=099 > a@=001 = of/2=0058 => 1-4/2 = 0.995 
Zefs = 2995 = 2576 { From Table 10(b)} 

The two-sided 90% confidence interval for Lt is 


1-a/2 qs <uU< P+ 4-2 
4.1 
Jj 225 


10.9 < p < 123 





x-2Zz 





TE 11.6 + (2576) a= 


J 225 


11.6 — (2.576) 


12.8.4 Any Population, o* known/unknown, n large. Suppose that a random sample 
X,, X,,°°*:*, X, of size n is drawn from a population with mean p and variance o*. We 
wish to construct a confidence interval which is likely to trap the true unknown value of the 
population mean yp with a degree of confidence 1 — a. If the population is not normal, and if 
o* is either known or unknown, then according to the Central Limit Theorem the sampling 
distribution of X ‘is approximately normal with mean p and variance o2/n (when o? is 
known and § 2/ n when oc? is unknown) if the sample size is sufficiently large, say, n > 30. 
Consequently the distribution of the statistic 


Tesi Se 

a/jn S/n 

is approximately normal with mean Q and variance 1. Then a two sided 100(1 - a )% 
approximate confidence interval for population mean / is given by 


Z=: 


Os be Cee Xz aie 
yn i! 


We now turn to the more realistic situation for which the population variance o? is unknown. 
Because n is large, replacing o with its best unbiased estimator S does not appreciably 
affect the probability statement. When n is large and population variance o? is unknown, a 
100 (1 — a)% approximate confidence interval for population mean J is given by 


-~ a“ 


= S _ Ss 
i pe SLB SSE 


jn . 








If x and § is the mean and standard deviation of an observed random sample of size n 
_ sufficiently large from a population with unknown mean and unknown but finite variance co, 
thena 100(1 — a)% approximate confidence interval for 2 is given by 


— 5 
<< ¥ + 4g —— 


X — Z_q/2 {n 


Jn 





Estimation | 87 


™> 


This can be written xX £2_g2 





Ie 


12.8.5 Sampling Without Replacement. When sampling is done without replaconent from a 
finite population of size N, the standard error of X is given by 


(Nizsns 
ch 


If the sample size n is greater than 5% of the population size N(it.e.n > 0.05 NV), thena 
100 (1 — a )% confidence interval for 2 is given by 


t 20/2 Te 
However, when nis large and Population variance o? is unknown, a 100(1 -— @ )% 
approximate confidence interval for population mean // is given by 


v Ss N-nav 
ay ~ | 


If x is the mean of an observed random sample of size n taken from a population with unknowr 
mean pf and known variance o7,thena 100(1 — «)% confidence interval for pi is given by 


FE a Neca ee 
oy aN Ne 


If X and § is the mean and standard deviation of an observed random sample of size n 
. sufficiently large from a population with unknown mean ff and unknown but finite variance o*, 
thena 100(1 — &)% approximate confidence interval for p is given by 


X+z 21S RNS 
= Le eu iat N-1 


The finite population correction (N —1n)/(N —1) may be ignored when the sample size nt is 
less than 5% of the population size N(i.e., n < 0.05 NN). 


Example 12.9 A particular component in a transistor circuit has a lifetime which is known to 
follow a skew distribution. A random sample of 250 components from a week's production given 
an average lifetime of 840 hours, and the variance of lifetimes is - 483 ( hours * ). Find 


X + wo 








approximately 95% confidence limits to the true mean lifetime in the whole population of the 


product. 


Solution. The size, mean and variance of the sample are 
n = 250, x = 840, S20=) 483s nes = 21.98 
Confidence coefficient: 1-—a = 0.95 


1-a@-= 095 > a = 005 => af/2 = 0025 = 1-a/2 = 0975 
Zi-a/2 = 29975 = 1.960 { From Table 10 (d) } 


88 . Statistics — Part I 


The two-sided 95% approximate confidence interval for ju is 


a “a 


Ss . 


F-h-92 <u ea ecinalt 77 


840 — (1.960) 128 <u< 840 + (1.960) Ze. 


J 250 | 250 
837.3 < pw < 842.7 

Example 12.10 A random sample of size n = 200, selected without replacement from a finite 
population of size N = 1000 with o = 1.28, showed that X = 68.6. Construct a 97% 
confidence interval for the mean [1 of the population. 
Solution. The size and mean of sample and the size and a standard deviation of population are 

n = 200, x = 68.6, N = 1000, o = 1.28 
Confidence coefficient: 1 - a = 0.97 

ba = 097° => a = 003 => of2 = 0015 = 1-—--a/2 = 0.985 

Zi-a/2 = Zo9s5 = 2.17 { From Table 10 (a) } 


* The two-sided 97% approximate confidence interval for y is 


xtz st NE 
tee A YON =I 


WS (AG) Se || Bae 


J 200 1000 — 1 
( 68.42 , 68.78 ) = 68.42 < ph < 68.78 
Example 12.11 An auditor has selected a simple random sample of 100 accounts from the 
8042 accounts receivable of a freight company to estimate the total audit amount of the 
receivable in the population. The sample mean is X = 33.19 and the sample standard deviation 
is § = 34.48. Obtain the 95.44 percent confidence interval for the mean audit amount in the 
population. 
Solution. The size, mean and standard deviation of the sample and the population size are 
n = 100, X=" 355.19, S = 34.48, N = 8042 
Confidence coefficient: 1-—a = 0.9544 
1-—-a@ = 0.9544 => a = 0.0456 => a/2 = 0.0228 => 1-a/2 = 0.9772 
Z1-a/2 = 2099772 = 2 { From Table 10 (a) } 





The two-sided 95.44% approximate confidence interval for the mean audit amount p is 
xtz sSie eset 
l-a/2 {in N-1 
34.48 8042 — 100 
J 100 8042 —1 


( 26.3366 , 40.0434 ) ~ 26.3366 < pn < 40.0434 


33.19 t+ (2) 





Estimation 89 


“= 





12.8.6 Normal Population, c* unknown, 1 small. Suppose that a random sample X,, X,, 
, X, Of size n is drawn from a normal population with unknown mean fi and unknown 


variance o*. We wish to construct a confidence interval which is likely to contain the true 
unknown value of population mean pf with a confidence coefficient 1 — a. However, time and 
cost restrictions would probably limit the sample size to a small number. Many inferences in 
business must be made on the basis of very limited information, i. e., small samples. When the 
population is normal, the sampling distribution of the statistic 


et 
Jn 
is a f-distribution with v = n — 1 degrees of freedom. Thus, when n is small, population is 
normal and population variance is unknown, a two-sided 100(1 — «)% confidence interval for 
population mean p is given by 


“A a 


Xe ae u<X +t =e 
vil-a/2 Jn v;l-a/2 Jn’ 
If X and § is the mean and standard deviation of an observed random sample of size n from 


a normal population with unknown mean p and unknown but finite variance o7, then a 
100 (1 — a)% interval for py is given by 





a 


= 5 $ 


X—ty.y_g/p mar SU SX th Ho 

v:l-—a/2 Jn vi:l-—a/2 Jn 

This can be written oq ae ee Ta 

n 

For large degrees of freedom ( .g., beyond the range of Table 12) the t-distribution can be 
approximated by a standard normal distribution. | 


Example 12.12 Ten packets of a particular brand of biscuits are coasen at random and their 
mass measured in grams, The results are 


n = 10, Sx =o ST Lx? = 1583098.3 





Assuming that the sample is taken from a normal population with mean mass yp, calculate the 
98% confidence interval for [. 


Solution. The mean and standard deviation of the sample are 
> Xe Lond 


x x 2 
xe nx? = 1583098.3 — 10(397.87) = 3213 
n—] 10 - 1 | 


Confidence coefficient: 1—-a = 0.98 
l-a = 098 > a = 002 = a/2 = 001 => 1-a/2 = 099 


“> 


Ty * 


90 Statistics — Part II 


Degrees of freedom : v=en-12=10-12=9 
Wet-af2 = '9:099 = 2-821 (From Table 12 ) 
The two-sided 98% confidence interval for f is 
x —-t $ <u<x+? Sic! 
. vi;l-—a/2 Jn . vil-—a/2 Jn 
3.213 3.213 


< p < 397.87 + (2.821 ) 
J 10 J 10 


395.004 < pn < 400.736 











397.87 — (2.821 ) 


Exercise 12.2 


1. (a) What is meant by estimation ? Distinguish between point estimate and interval estimate. 
Why is an interval estimate more useful ? 


(6) Distinguish between the following 
(i) Estimator and estimate, 
(ii) Point and interval estimation.. 
2. (a) Explain what is meant by 
(i) Confidence interval, 
(ii) | Confidence limits, 
(iii) Confidence coefficient. 


fh) When would a confidence interval be preferred over point estimation for a parameter. 
( When the reliability of a point estimator is needed, the confidence interval conveniently 
express the estimator along with its measure of variation. Reliability is reported through 
the confidence coefficient and the variation is reflected in the length of the interval. ) 


3. (a) Finda 90% confidence interval for the mean of a normal distribution with o = 3, 
| given the sample as 2.3, — 0.2, —0.4, —0.9. 
(—2.268 < pu < 2.668 ) : 


(6) The standard deviation of the amounts poured into bottles by an automatic filling - 
machine is 1.8 mi (millilitre ). The amounts of fill in a random sample of bottles, in 
ml, were 481, 479, 482, 480, 477, 478, 481 and 482. Suppose the population of 
amounts of fill is normal. Construct a 90% confidence interval for the mean amount in 
all bottles filled by the machine. 
(478.95 < pp < 481.05) 

4. (a) A random sample of size 36 is taken from a normal population with a known variance 
o* = 25. If the mean of the sample is 42.6, find 95% confidence limits for the 
population mean. — 

(40.967 < p < 44.233) 


(6) A school wishes to estimate the average weight of students in the sixth grade. A random. 
‘sample of n = 25. is selected and the sample mean is found to be x = 100 /bs. The 


Estimation 91 


5. (a) 


(5) 


6. (a) 


(b) 


7. (a) 


(b) 


(c) 


(d) 


8. (a) 


(5) 


standard deviation of the population is known to be 15 /bs. Compute 90% confidence 
interval for the population mean. ) | 
(95.065 < p < 104.935 ) 


Suppose that the weights of 100 male students of a university represent a random 
sample of weights of 1546 students of the university. Find 99% confidence intervals 
for the mean weight of the students, given x = 67.45 and Sis 2.0 5: 

(66.68 < ph < 68.22) 


150 bags of flour of a particular brand are weighed and the mean mass is found to be. 


748 g with standard deviation 3.6 g. Find 98% confidence intervals for the mean 
mass of bags of flour of this brand. 

( 747.316 < pp < 748.684 ) 

If the two-sided 100( 1 — o)% confidence interval based on random sample taken 
from X ~ N(p, o7) is 12.18 < p < 20.56, find x. 

(x = 16.37) 

On the basis of the results obtained from a random sample of 100 men from a particular 
area, the 95% confidence interval for the mean height of the male population of the area 
was found to be (177.22 cm. to 179.18 cm). Find the value of X , the mean of the sample 
and o, the standard deviation of the normal population from which the supe was 
drawn. Find 98% confidence interval for the mean height. 

(x = 178.2, Gg. = 5, 177.04 < p < 179.36) 

Explain what is meant by the statement, ‘we are 100 (1 — a)% confident that our 
interval estimate contains .’ 

{ In repeated sampling, 100 (1 — a )% of all such confidence intervals contain j. } 
Explain what is meant by the statement, “we are 95% confident that our interval 
contains LL”. 

(In repeated sampling, 95% of all such confidence intervals contain J. ) 


If an 85% confidence interval is 27.5 < p < 43.8, what does this statement mean ? 
( Intervals so formed would contain pf 85% of the time, ) 


If « = 0.10, how many intervals would be expected to contain py? 

( We would expect about 90% of the intervals to contain pf and 10% to 
miss pt in the long-run in repeated sampling.) 

What role does the sample mean play in a two-sided confidence interval for p, based on 
arandom sample from X ~ N(p, 07)? 


( The sample mean is the midpoint of the confidence interval but has no effect on the 
length of the interval.) 


When setting a two-sided 100(1 -— @ )% confidence interval for Ll, based on a 
random sample of size n from a normal population, how the following changes will 
affect the length of the confidence interval for jp: (Assume all other quantities 
remain fixed.) 


(i) increasing n (ii) increasing (1 — q) 
(iii) decreasing n (iv) decreasing (1 - a) 
(vy) increasing 5” _ (vi) increasing x 
(vii) increasing & _ (viii) decreasing $ 


( decreased, increased, increased, decreased, increased, no effect, decreased, decreased ) 


a 


i Ui 
<i 
. 


a 


92 
9. (a) 


(6) 


10. (a) 


(0) 


1L.©@ 


(5) 


12. (a) 


(0) 


12.9 


Statistics — Part I 


Define Student's r-statistic. What assumptions are made about the population where the 
t-distnbution is used? 

The contents of 10 similar containers of a commercial soap are: 10.2, 9.7, 10.1, 10.3, 
10.1, 9.8, 9.9, 10.4, 10.3 and 9.8 litres. Find 99% confidence interval for the mean 
soap content of all such containers, assuming an approximate normal distribution. 
(9.807 < yp < 10,313) 


The masses in grams, of thirteen ball bearings taken at random from a batch are 21.4, 
23.1, 25.9, 24.7, 23.4, 21.5, 25.0, 22.5, 26.9, 26.4, 25.8, 23.2, 21.9. Calculate a 
95% confidence interval for the mean mass of the population, supposed normal, from 
which these masses were drawn. 

(22.82 < pw < 25.14) 


A random sample of seven independent observations of a normal variable gave 


x = 35.9, 5 x* = 186.19. Calculate a 90% confidence interval for the population 
mean 


(4.70 < pw < 5.56) 


A random sample of eight observations of a normal variable gave >, x = 261.2, 
(x -— xX)? = 3.22. Calculate a 95% confidence interval for the population mean. 
(32.08 < pf < 33.22) 


A sample of 12 measurements of the breaking strength of cotton threads gave a mean 
x = 209 grams and a standard deviation 5 = 35 grams. Find 95% and 99% 
confidence limits for the actual mean breaking strength. 


(186.76 < p < 231.24; 177.62 < yp < 240.38) 


A random sample of 16 values from a normal population showed a mean of 41.5 
inches and a sum of squares of deviations from this mean equal to 135 (inches)’. Show 
that the 95% confidence limits for this mean are 39.9 and 43.1 inches. 

Find a 99% confidence interval for the mean of normal distribution with o = 2.5 and 
if a sample of size 7 gave the values 9, 16, 10, 14, 8, 13, 14. What would be the 


confidence interval if o@ were unknown. 
(9.566 < p < 14.434; 7.797 < yw < 16.203 when o is unknown. ) 


e 


CONFIDENCE INTERVAL FOR POPULATION 
PROPORTION OF SUCCESSES, 1 


The interval (L,U) isa 100(1 — a@ )% confidence interval for the population 


proportion of successes 1 if prior to sampling 


P(L<n<U)=1-a4 


This definition simply states that a confidence interval with confidence coefficient 1 — @ is an 
interval estimate such that the probability is 1 — a that the calculated limits include m for any 
random sampling. In other words, in many repeated random samples of size n from a Bernoulli 
| population, 100( 1 — & )% of the interval estimates will include the true population proportion 


Estimation | Hi 93 





of successes 1 and therefore will be correct and 100 a % of the interval estimates will not 
include m and therefore will be incorrect. 

In many problems we must estimate the population proportion or percentage, for 
example, the proportion of defectives found in shipment of raw matenals upon inspection, In this 
case it seems to be reasonable that we are sampling from a Bernoulli population; hence our 
problem is to estimate its parameter 7. The interval is based on the estimator P = X/n, the 


sample fraction of successes. We know that the sampling distribution of P is a binomial 
distribution. The binomial distribution of the estimator P can be approximated by the normal 
distribution with a mean of “Zp, = ™ and a standard deviationof o, = J m(1—7)/n , when 
n is large and m is nottoonear 0 or 1. Consequently the distribution of the statistic ~ 

P-T% 
J P(L- P)/n 
will be approximately normal with mean 0 and variance 1. Then a two-sided confidence interval 
for population proportion of successes Tt is given by 


| P= P) | | P=) 
P= 21 _o/2 Rip os <0 < P+ Za a 


If p = x/n is the proportion of successes in an observed random sample of size n, then a 


100 (1 — a@)% confidence interval for 7 is given by = 


| p= P) | 20=p) 
- > .}———- <tc ptz,_ ee 
p Z1-o/2 n P l-—a/2 m= 
| l—p. 
This can be written pt rH aeip ye PUP 
il 


12.9.1 Sampling Without Replacement. When sampling i is done without replacement from a 
finite population of size N, the standard error of P is piven by 


AUS N=-n 

iva 

. _ [Pa=P) [N=n 
cm n N-1 


If the sample size n is greater than 5% of the population size N (i. e.,n > 0.05 N), then a 
100 (1 — a )% confidence interval for 7 is given 


P(i-P) N-n 
See {rer 


The finite population correction (N —n)/(N — 1) may be ignored when the sample size n is: 
less than 5% of the population size N (i. e.,n < 0.05 N). 


which is estimated as 


= eS Statistics — Part 1 





Example 12.13 Ina random sample of 500 young persons from a small town 40 were found 
to be unemployed. Compute a 96% confidence interval for the rate of unemployment in the town. 
Interpret the result. | 


Solution. The sample size, number of successes and proportion of successes in the sample are 
| n = 500, x = 40, p=> = 2 
Confidence coefficient: 1—« = 0,96 
l-a = 096 > a = 0.04 = of/2 = 002 = 1-a/2 = 0.98 
Zi_aja = Zoggs = 2.054 { From Table 10 (a) } 


= 0.08 


The two-sided 96% confidence interval for 7 is 


| pa= ) | pl — p) 
P — 2)_q/2 a . tw < P + Z_a/2 eed 


0.08(1 — 0.08) zie 008 4 2.054 0.08 (1 — 0.08) 
500 500 


0.055 < nm < 0.105 


We are 96% confident that rate of unemployment is between 5.5% to 10.5% because our 
procedure will produce true statement 96% of the time. . 


Example 12.14 A poll is taken among the residents of a city and the surrounding country to 
determine the feasibility of a proposal to construct a civic centre. If 2400 of 5000 city residents 
favour it, find almost certain limits for the true fraction favouring the proposal to construct the 
civic centre. , 





0.08 — 2.054 


Solution. The sample size, number of successes and proportion of successes in the sample are 


n = 5000, oa p= = ==. = 048 

Confidence coefficient: 1-— «a = 0.999 ( almost certain is 99.9% confident ) 

3 1-a = 0.999 > a = 0.001 = o/2 = 0.0005 = 1/2 = 0.9995 
Zi-a/2 = Zo995 = 3.291 { From Table 10 (8) } 


The two-sided 99.9% confidence interval for 1 is 


: | p(l— p) | p(l— p) 
P— %_o/2 ss < 1 < p F Z)~¢/2 Bina 8 





| 8-048) | 48(1— 0.48 
DAS BOT eee ggg + 3.291| Oe 0-) 
5000 | 5000 


0.457 < 2 < 0.503 
We are almost certain that the true fraction favouring the proposal to construct the civic centre 
lies between 0.457 and 0.503. bis 


Estimation . | 95 





Example 12.15 A random sample of 250 from the 5000 students in Govt. College, 
Gujranwala contained 30 left-handed students. Give an approximate 95% confidence interval 
for the proportion of left-handed students in the college. 


Solution. The sample size, number of successes and proportion of successes in the sample and 
the population size are 


n= 250) X= 0 pa Evy = no0% = 0.12, N = 5000 
n 250 © 
Confidence coefficient: 1—a@ = 0.95 
l-o@ = 0.95 > a = 005° 55 0/25—— 0025 oe = 0975 
Zj-a/2 = 29975 = 1-960 { From Table 10 (b) } 


The two-sided 95% confidence interval for 7 in the finite population is 


| p= p) [Nea 

+ >.——— .|} —! 

P “l-a/2 n N-]1 

0.12 + 1.960 eee 5000 — 5000 — 250 
~ 5000-1 


(0.081, 0.159) 0.081 < mx < 0.159 





| Exercise 12.3 


1. (a) A sample poll of 100 voters chosen at random from all voters in a given district 
indicated that 55% of them were in favour of a particular candidate. Find 
(i) 95% and (ii) 99% confidence limits for the proportion of all the voters in favour 
of this candidate. 
{ ({) 0.453 < mn < 0.647, (ii) 0.422 < x < 0.678 } 

(6) In a random sample of 1000 houses in a certain city, it is found that 228 
own colour television sets. Find 989% confidence interval for the proportion of houses 
in this city that have coloured sets. 

(0.197 < m < 0.259) | 

2. (a) In 40 tosses of a coin 24 heads were obtained. Find (i) 95% and (ii) 99.73% 
confidence limits for the proportion of heads which would be obtained in an unlimited 
number of tosses of the coin. , 
{ (it) 0.448 < nm < 0.752, (ii) 0.368 < nm < 0.832 } 


(6) Arandom sample of 200 voters in a constituency included 110 who said they would 
vote for Mr. A. Assuming all the 15000" voters in the constituency would vote, give an 
approximate 95% confidence interval for the proportion ho would vote for Mr. A. 
(0.4815 < m < 0.6185) 

(c) Arandom sample of 500 pineapples was taken from a large consignment and 65° were 
found to be bad. Show that the percentage of bad pineapples in the consignment almost 
certainly lies between 8.05 and 17.95. 

| © 


96 | _ Statistics — Part I 


12.10 COMPARATIVE STUDIES 


To this point we have been concerned with inferences about parameters of a single 
population, We now tum our attention to estimation procedures that are important in comparing 
the parameter values of two populations. To make inferences about two populations; we must 
obtain two samples — one from each population. There are many methods by which the two 
samples could be obtained; we will discuss two of them in this text. These methods result in 
either independent or dependent samples. 


12.10.1 Independent Samples. If two samples are selected, one from each of two populations, 
then the two samples are independent if the selection of objects from one population is unrelated 
to the selection of objects from the other population. 


12.10.2 Dependent Samples. If two samples are selected, one from each of two populations 
then the two samples are dependent, if for each object selected from one population an object is 
chosen from the other population to form a pair of similar objects. These samples are also called 
as matched samples, The set of sample pairs is called a paired samples. 


The key to recognizing two independent samples is to realize that they are always two 
different random samples, whereas the dependent samples aways consist of matched, or paired, 
observations. 


12.11 CONFIDENCE INTERVAL FOR DIFFERENCE 
BETWEEN TWO POPULATION MEANS, pL, - L, 


In many business and management problems, we wish to estimate the difference 

between the means of two populations. For instance, we, may want to decide upon the basis of 
suitable samples to what extent, if any, a fertilizer is more effective than an existing fertilizer; a 
newly introduced product is more reliable than an existing product, or the degree to which a 
particular training programme improves worker attitudes or performance. 
12.11.1 Independent Samples: Normal populations, known variances, any sample size. If 
X ; and X., are respectively, the means of independent random samples of sizes n, and n, 
taken from two normal populations having means f1, and p, and variances o¢ and 3, then 
X, ~ N(pu,,07/n) and X, ~ N(u,,03/n) and that X, is independent of X,. 

Since any linear combination of independent normal random variables is also normally 
distributed, then X, — X, is a random variable having a normal distribution with mean 


Ml, — Mp and variance o7/n, + o2/n, . Thus the distribution of random variable 
wie (X, — X,) — (H; — Hy) 
o 2 
pole Sat 
| Me 


is a standard normal distribution. Then a two-sided 100(1 -— a )% confidence interval for 
difference between means of the two populations 1, — 1, is given by 


o? 
CXG— Xe itz why tee - 
nh Mo 


Estimation | | 97 





If x, and x, are the means of the two independent observed samples, then a 100 ( 1 —- a)% 
confidence interval for 4, — Hl, is given by 
CaO 


(x, - X,) £ Z;_ a/2 n, yr n, 





Example 12.16 Apex’s current packing machinery is known to pour ground coffee into 
“|-pound cans” with a standard deviation of 0.6 ounce. Apex is considering.using a new packing 
machine which is said to pour coffee into “\-pound cans” more accurately, with a standard 
deviation of 0.3 ounce. Both machines pour ground coffee according to a normal distribution. 
Before deciding to invest, Apex wishes to evaluate the performance of the new machine against 
that of the old machine. A sample was taken on each machine against that of mean weight of the 
contents of the “\-pound cans” yielding the following result. 





Using Old Machine: I 

Using New Machine: II 
Construct a 95% confidence interval for the difference in the average weight of the contents 
poured by the old versus the new machine. 


Solution. The sizes and means of two samples and the standard deviations of two populations are 
n, = 25, x, = 16.7, o, = 0.6 


n, = 36, X, = 15.8, o, = 03 


Confidence coefficient: 1-—- a = 0.95 , 
l-a=095 > a=0.05 => a/2 = 0.025 => | - af/2 = 0.975 
1.960 { From Table 10 (5) } _ 


“1-a/2 = “o97s = 


The two-sided 95% confidence limits for ff, — H, are 





(X,-— %,) £ %_g2.J—— + 


(0.6)? A (0.3)? 
36 


(0.65, 1.15) => 0.65 < W,-M, < 1.15 


(16.7 — 15.8) + 1.960 








12.11.2 Independent Samples: Any populations, variances known/unknown, large samples. 
When both sample sizes, are large ( say greater than 30 ) the assumptions regarding small samples 
can be greatly relaxed. It is no longer necessary to assume that the parent distributions are normal, 


because the Central Limit Theorem assures that X , iS approximately normally distributed with 
mean 1, and variance o?/n, ,’and that X, is also approximately normally distributed with 
mean p, and variance o3/n,, and that X, is independent of X,, then (X, - X,) is 
approximately normally distributed with mean H, — Hy and variance o?/n, + o2/n,. 





eee) CC«séCSfaatistics — Part 


Thus the random variable 


(X, —- X,) - (u, - by) 


2 
2 2 
oO ao 
cae 
ny Lb) 


has an approximately standard normal distribution. Because n, and n, are both large, the 


Z= 


approximation remains valid of if o; and o% are replaced by their sample variances S$? and 
$2: The assumption of equal variance is not required in inferences derived from large samples. 
We can modify the previous result to obtain a confidence interval by substituting the 
sample variances § - for o? and Sy for a3 as long as both samples are large enough 
(n, > 30, n, > 30 ) for the Central Limit Theorem to be invoked. Hence the distribution of 
random variable X = X 2 approaches a normal distribution with mean ft, — HM, and variance 
S?/n, + Spiny. 

Then the distribution of random variable 


Fives (X, - X,) = (LL, — HU, ) 





approaches the standard normal distribution. Then a two-sided 100(1 -— @ )% approximate 
confidence interval for difference between means of the two populations }, — , is given by 


Bg gt : Sze nS 
X,— X,) +2 a — 
(xX, 2) l-a/2 n, n, 
If x, and X, are the means and $s? and 5? are the variances of the two independent observed 
random samples, then a 100 (1 — & )% approximate confidence interval for uw, — pu, is 


given by : 
. ee 2 oF: 
S (xre—oxst yt a jit + 
(x — X2) 1-a/2 n, Ny 


Example 12.17 Rural and urban students are to be compared on the basis of their scores on a 
nation wide musical aptitude test. Two random samples of sizes 90 and 100 are selected from 
rural and urban seventh class students. The summary statistics from the test scores are 


_ Size | Standard deviation — 





Rural: 1 nm, = 90 x, = 76.4 


7.6 


Establish a 98% confidence interval for the difference in population mean scores between urban 
and rural students. 


Urban: II =| 7% = 100 rey $, 


Estimation PO 





Solution. Confidence coefficient: 1- a = 0.98 
l-a = 098 => a= 002 > af2 =00 > 1-a/2 = 0.99 
Zi-q/2 = %99 = 2.326 { From Table 10 (5) } 


The two-sided 98% approximate confidence interval for “4, — /, 1s 


(x. xX, ) ez sf + a 

2 *) 1-a/2 n, n, 
(7.6)? (8.2)? 
100 90 


(21, 7:5) =>) 2S yi ee er 








($1.2 - 76.4) + 2.326 


We conclude, with 98% confidence, that the mean of urban scores is at least 2.1 units higher 
and can be as much as 7.5 units higher than the mean of rural scores. 


12.11.3 Independent Samples: Normal populations, same unknown variance, small samples. 
When n, and n, are small and o? and o2 are unknown, the formula for constructing a 
confidence interval that we have been discussing cannot be used. However, for independent 
samples from two normal populations having the same unknown variance o?, we can develop a 
confidence interval for 4, — [, as follows: . ; 

If X , and X , are respectively, the means of two independent random samples taken 
from populations which are N(j,, o*) and Myu,, o), then Xx, ~ NH, o7/n ) and 
X, ~ Mu, 07/n) and that X, is independent of X,. 


Since any linear combination of independent normal random variables is also normally 


distributed, then X, — X, is normally distributed with mean 


fy. ok. = UH, - H, 


Thus the random variable 





has a standard normal distribution. Thus, if Sa and Ss are the two sample variances 
( both estimating the variance o* common to both populations ), the pooled (weighted arithmetic 
mean ) estimator of a? , denoted by Sa is 





a Pe 


Statistics — Part I 


100 
52 = MDS? + (m=NSp _ EX = XV? + Bn ~ X02)? 
P nm +n, —2 % +N, — 2 
(ZXf —n, X?) + (ZXP —n, X}) 
Then the random variable 
T = (X, — X,) — (uy — up) 
Sepp 
PY n, n, 
has a #-distribution with v = n, + nm, — 2 degrees of freedom, Then a two-sided 


100 (1 — a )% confidence interval for difference between means of the two populations 


Ht, — H Is given by 
Kv ke)) oe § wep a 
\ 1~ %2) F yg 5, i, + a 


If x, and x, are the means of the two observed random samples and s,, is the pooled estimate 


of the common standard deviation of the two normal populations, then a 100( 1 -— a )% 
confidence interval for 4, — H, is given by 


YF Ss | l l 
(x, _ x, ) + Nyt a2 sy nh + Ny 


It should be noted that this is used under the conditions when the sample sizes are small 
(i ¢@, nm, S 30 and, S 30). When both n, and n, are greater than 30, it is legitimate 








to use 
(X,-X,)+£ i + 3 
z - 
! 2 1-a/2 n, ns 
as a good approximation. 


Example 12.18 Suppose you want to estimate the difference in annual operation costs for 
automobiles with rotary engines and those with standard engines. You find 8 owners of cars 
with rotary engines and 12 owners with standard engines, who have purchased their cars within 
the last two years and are willing to participate in the experiment. Each of the 20 owners keeps 
accurate records of the amount spent on operating his or her car ( including gasoline, oil, 
repairs, etc.) fora 12 month period. All costs are recorded on a per 1000 mile basis to adjust 
for differences in mileage driven during the 12 month period. The results are summarized below: 





ome —_ 


Estimation | | | 101 


Estimate the true difference (1, — ,) between the mean operating cost per 1000 miles of 
cars with rotary and standard engines, Use a 90% confidence level. 


Solution. The pooled estimate of population common standard deviation is 


( 





n + Ny ~ =p) 








= en Gee 2 
_ | G=)G85)? 402=1)(635)2" Toa 
8+12-2 


Confidence coefficient: | — a = 0.90 
l-o = 0909 > a= 010 = of2 = 005 = 1-af/2 = 0.95 
Degrees of freedom: vV=zantan—-2 = 8+12-2 = 18 
tyey-a/2 = tyg.95 = 1-734 (From Table 12) 
- The two-sided 90% confidence interval for LM, — H, is 
(x,-x,) +t s aes al 
ed 2/ — “v;l-a/2 “p n, 


no 
] 
(56.96 — 52.73) + 1.734( 5.813 ) Se 


oly 


(-—0.37, 8.83) => -0.37 < p, - HL, < 883 
Example 12.19 Given two random samples from two independent normal populations, with 


___ Sum of squares 
D(x — X= ole 





= 60 Y(x, -— %,)° = 365.17 


Find a 99% confidence interval for ([L, — H,). Assume that population Variances are equal. 


Solution. The pooled estimate of population common standard deviation is 
L(x) — %)? + LCi. = %2)? DS Si2 e665 ie 5.66 
nm +n, —2 WN M112 


Confidence coefficient: 1 —-a = 0,99 





1-a = 099 > a = 001 = a/2 = 0005 = 1-a/2 = 0.995 
Degrees of freedom : v=aentn,-—2 = ll+ 14-2 = 23 


102 ' : Statistics — Part I 





The two-sided 99% confidence interval for (fl, — H,) is 


_ I l 
ras Se SV) wf coe Ss. .|—+— 
(2; 2) v;l-a/2 “p n, n, 


] l 
75 — 60) + 2.807( 5.66) .| — +— 
( ) 07 ( ) Tpti4 


(8.6, 21.4) = 8.6 < w,-, < 21.4 


Example 12.20 A course in mathematics is taught to 10 students by the conventional class 
room method. A second group of -12 students was given the same course by means of 
programmed materials. At the end of the semester, the same examination was given to each 


group. Their scores are given below: 


|Group 1 |70 66 76 77 73 72 68 74 75 69 
Group 11 | 77 83 92 85 82 84 80 86 91 93 80 87 


Compute a 90% confidence interval for the difference between the average scores of the two 
populations. Assume the populations to be approximately normal with equal variance 
Solution. The sample means and pooled estimate of population:common standard deviation are 


TOR OOT ON 77) 7 
83 


Siei2mnOs 74 *75:. 69 
77 JZaeSSeoczeeies —80i5 86 (91 93 80 87 

















ee 2 7 
1 

x, = LI = a = 85 
LP. 


) 





(x7 —n x/ ) + (> x3 om Mo x3 





{51960 —10(72)* + {86982 -12(85)?} _ 4 4g 
10+ 12-2 


Confidence coefficient: 1-a = 0,90 
l1-a = 090 > a=010 = af/2 = 005 = 1-a/2 = 095 
Degrees of freedom : vV=antn,-2 = 10+ 12-2 = 20 


ty.1-a/2 = '29:095 = 1-725 (From Table 12) 


The two-sided 90% confidence interval for 4, — [, is 
1 
4+ 


a A l 
-(X%, —X,) Eh. — 






——— i om 


Estimation | | ; 103 





(85 — 72) + 1.725 (4.48) + 


| fie wl es | = 9.69 < W,- HM, < 16.31 


Exercise 12.4 


1. (a) A test in statistics was given to 50 girls and 75 boys. The girls made an average grade 


(b) 


2. (a) 


(b) 


of 76 with a standard deviation of 6, while the boys made an average grade of 82 with 
a standard deviation of 8. Find a 96% confidence interval for the difference ,- H,. 
where jL, is the mean score of all boys and #2, is the mean score of all girls who might 
take this test. 

(3.42 < pf, -— HM, < 8.58) 


A manufacturing company consists of two departments producing identical products. It 
is suspected that the hourly outputs in the two departments are different. Two random 
samples of production hours are respectively selected and the following data are 
obtained: 


Department 1 Department 2 
Sample size: 64 ets 
Sample mean: 100 90 


The variances of the hourly outputs for the two departments are known to be o? = 256 
and o3 = 196 respectively. What is the point estimate for the true difference between 
the mean outputs of the two departments ? Find the 95 percent confidence limits for the 
true difference. 
(xX, -X, = 10; 4456 < pw, -—w, < 15.544) 
Two independent samples of 100 mechanists and 100 carpenters are taken to estimate 
the difference between the weekly wages of the two categories of workers. The relevant 
data are given below: 
- Sample mean wages Population variance 

Mechanists: 345 196 

Carpenters: 340 204. 
Determine the 95% and the 99% confidence limits for the true difference between the 
average wages for machinists and carpenters. 
(1.08 < Mf, —H, < 8.92; -0.152'< wu, —yY,.< 10.152) 
General Incorporated Mill's packing machinery is known to pour dry cereal into 


economy-size boxes with a standard deviation of 0.6 ounce. Two samples taken on two 
machines yields the following information: 


Machine | Machine 0 
nm = 15 | n, = 21 


_ 


x, = 18.7 ounces xX, = 21.9 ounces 





104 Statistics — Part U 


Assuming machine I packages a content that is NV (f,, 0.36 ) and machine IU 
packages a content that is V (4, , 0.36 ), construct a 95% confidence interval estimate 
of u, — fl). 

3. (@) A-sample of 150 brand A light bulbs showed a mean lifetime of 1400 hours with a 
standard deviation of 120 hours. A sample of 200 brand B light bulbs showed a mean 
lifetime of 1200 hours with a standard deviation of 80 “hours. Find 95% and 99% 
confidence limits for the difference between the mean lifetime of the populations of 
brands Aeand B. 

(177.825 < ft, — My < 222.175; 170.856 < mw, — uM, < 229.144). 

(6) Let two independent random samples, each of size 100, from independent normal 
distributions N(i,, 07) and N(p,, 02) yield X, = 4.8, s? = 8.64, x, = 5.6, 
S$; = 7. Finda 95% confidence interval for (1, — 1, ). 
(0.025 < w,- py, < 1.575) 

4. (a) In order to ascertain the age distribution of operatives in a certain industry, random 
samples of 1720 males and 1230 females are drawn. The sample means and standard 
deviations were 33.93 years and 14.20 years for the males and 27.44 years and 10.79 
years for the females. Calculate the 95 percent confidence interval for 

({) the mean age of all male operatives, 
(ii) the mean age of all female operatives, 
(iii) the difference between their mean ages. 
(33.259 <p, < 34.601; 26.837 <p, < 28.043; 5.588 <p, — L,< 7.392) 


(6) The means and variances of the weekly incomes in rupees of the workers employed in 
the different factories, from the samples are given below: 


Sample Size Mean Variance 
Factory A 160 12.80 64 
Factory B 220 11.25 49 


(i) | What is the maximum likelihood estimate of the difference in mean incomes? 


(ii) Compute the 95 percent confidence interval estimate for the real differences in 
the incomes of the workers from the two factories. 
{ (i) 1.55, (ii) 0,003 < fy - Hy < 3.097 } 


5. (a) Let two independent random samples, each of size 100, from two independent normal 


distributions M(i,, of) and N(p,, 63) yield ¥ = 4.8, §? = 8.64, ¥ = 5.6, 


$? = 7.88. Find a 95% confidence interval for (41, — {A ). 
(-16< Hy - 2, <9) 
(6) Given that 
X= 75, n= 9, Xj -,) = 1482 
¥, = 60, Ny = 16, LX xj9 ~ F,)* = 1830 


Estimation 105 





and assuming that the two samples were randomly selected from two normal populations 
in which 0? = o32 (but unknown), calculate an 80% confidence interval for the 


difference between the two population means. 
(8.4 < uw, -H, < 21.6) 


6.(@) Tworandom samples of size n, = 9 and n, = 16 from two independent population 


having normal distributions provide the means and standard deviations; x, = 64, 
x, = 59, 5, = 6 and S, = 5S. Finda 95% confidence interval for 4, — UW, assuming 
0, = 0,. 


(0.37 < p, - U, < 9.63) 


(6) A course in mathematics is taught to 12 students by the conventional class- room 
method. A second group of 10 students was given the same course by means of 
programmed materials. At the end of the course, the same examination was given to each 
group. The 12 students meeting in the class room made an average grade of 85 witha 
standard deviation of 4, while the 10 students using programmed materials made an 
average of 81 with a standard deviation of 5. Find a 90% confidence interval for the 
difference between the population means, assuming the populations to be approximately 
normally distributed with equal variance. 

(0.693 < mw, - pH, < 7.307) 


© 


12.12 CONFIDENCE INTERVAL FOR DIFFERENCE 
BETWEEN TWO POPULATION PROPORTIONS, 1, — 7, 


We now turm to statistical inferences concerning a comparison between the rates of 
incidence of a characteristic into populations. Comparing infant mortality in two groups, the 
unemployment rates in rural and urban populations, and the proportion of defective items 
produced by two competing manufacturing processes are the examples of this type. The unknown 
proportion of elements possessing the particular characteristic in population I and in population 
II are denoted by m, and 1,, respectively. Our aim is to construct confidence intervals for the 


A random sample of size n, is taken from population I and the number of successes is denoted 
by X,. An independent random sample of size n, is taken from population II and the number 
of successes is denoted by X, . The sample proportions of successes are 

X,- X 

P, =" . . P, =) 2 

nm : Ny 
An intuitively appealing estimator for 7, — 1, is the difference between the sample proportions 
P, — P,. When constructing the confidence intervals for ™, — 1, we will use the sampling 
distribution of P, — P,. 


106 . Statistics — Part 
When both sample sizes n, and n, are large, the Central Limit Theorem assures that F, is 
approximately normal with mean 7, and variance m,(1 — 7€, ) /n, and that P, is approximately 
normal with mean ™, and variance m,(1-—,)/n, and that F, is independent of P, 

Since any linear combination of independent normal random variables is also normally 
distributed then for large sample sizes n, and n,, the sampling distribution of the random 
variable P — P, is approximately normal with mean . 


Hp-p, = m- % 


and standard deviation 





On -P = 


The first result shows that P, — P, is an unbiased estimator of m, — 1,. For large sample sizes 
n, and n,, the random variable 
(FP -F) - (yy — > ) 
m(1=™|) | %2(1— 7) 
n no 





is approximately standard normal. The estimate of the stanclard error of P, — P, can be obtained 
by replacing m, and 7m, by their sample estimates P, and #7, as 


6 





Fd-F) 4 P, C- *) 
ny Ny 





R-R, ~ 
The random variable Z then becomes 


Z- (PF, -P,) — (%, ~~) 


ny n 





2 


Therefore, in the case of two large. ; , , ; | 
: vi two large, independent random ga..,les a 100(1 — a )% confidence 
interval for x, — x, can be readily constru — ie are 


Ap proximation. 


Then a two-sided 100(1 — o 
GQ) %)% confidence interval for re, - M2 is given by 
l 





CF, a F) + “1-a/2 AG- PF) 


ny + 





P,a- P,) 

, re 

ea 2 
P, and p, are the proportions in the two large 


100(1 — SF , indepe; ae es 
100 C1 — a) % interval for n, ~ Eas given by indepe;, dest observed random samples, then a 


(Pp, ~ P2) + 21 a2 





Estimation 107 





Example 12.21 An antibiotic for pneumonia was injected into 100 patients with kidney 
malfunctions (called uremic patients ) and into 80 patients with no kidney malfunctions (called 
normal patients). Some allergic reaction developed in 40 of the uremic patients andin 16 ofthe . 
normal patients. Construct a 95% confidence interval for the difference between the population 
proportions. 


Solution. The sizes, number of successes and proportions of successes in the two samples are 


n, = 100, x, = 40, Pi, iis ean geen 
l 
16 
n, = 80, x, = 16, py = > = = 02 
2 


Confidence coefficient: 1 - a = 0.95 | 
l1-a = 095 => a = 005 => a/2 = 0025 = 1-—a/2 = 0.975 
Z-a/2 = 2975 = 1.960 { From Table 10 (d) } 


The two-sided 95% confidence interval for difference in the population proportions m, — 7%, is 


P(l- p,) , P2(l- P2) 
(Pp, — P2) Z1-a/2 wird foe Id Ket in 2 2 
nm ny 
(0145-02) £1960, eee 
100 80 
(0.07, 0.33 ) = 0.07 < n,-n, < 0.33 





Exercise 12.5 


1. (a) Ina poll of college students in a large state university, 300 out of 400 students living 
in dormitories approved a certain course of action, whereas 200 out of 300 students not 
living in dormitories approved it. Estimate the difference in the proportions favouring 
the course of action and compute 90% confidence interval for it. 

(P;—- Py = 0.08; 0.023 < n,-—7m, < 0.137) 


(6) Inarandom sample of 400 adults and 600 teenagers who watched a certain television 
programme. 100 adults and 300 teenagers indicated that they liked it. Construct 95% 
and 99% confidence limits for the difference in proportions of all adults and ail 
teenagers who watched the programme and liked it. 

(0.19 < m,-, < 0.31; 0.17 < m,—%, < 0.33) 


2. (a) A poll is taken among the residents of a city and the surrounding country to 
determine the feasibility of a proposal to construct a civic centre. If 2400 of 5000 city 
residents favour the proposal and 1200 of 2000 country residents favour it, find a 
95% confidence interval for the true difference in the proportions favouring the proposal 


= Statistics — Part II 
Statistics = Part 


to construct the civic centre. 
(0.0945 < ™—7, < 0.1455) 


(6) The population of interest are the voting preferences of all registered voters in the Punjab 
and the Sind. Two independent random samples were taken from these populations and 
the values n, = n, = 1000, p, = 0.54 and p, = 0.47. Finda 95% confidence 


(0.026 < x, -17, < 0.114) 
3. (@) A market survey organization carried out a product taste study with consumers in two 
regions. In one region, a random sample of n, = 400 consumers was selected while in 
the other region an independent random sample of n, = 300 consumers was selected. 
Each person was asked to indicate which of two servings of product had a better taste. 
Unknown to the subject, one serving was a new high protein breakfast cereal and the 
other was an existing cereal. In the first region, proportion p, = 0.55 of the sample 

‘ persons preferred the new cereal and in the second region the proportion was 
P, = 9.65. Construct a 90% confidence interval for m, - T,. 


(0.039 < x,-7, < 0.161) 


(6) Independent random samples are selected from two populations with fractions of 
success 1, and 7, . Construct a 95% confidence interval for m, — m, for each of the 


following cases. 
(i) =n, = 100 Pp, = 0.72 n, = 100 Pp, = 0.61 
(ii) n, = 130 Pp, = 90.16 n, = 210 Pp, = 0.25 
(iii) on, = 70 P, = 0.53 n, = 60 p, = 0.48 


{() —0.02 to 0.24 (ii) —0.176 to —0.004 (iii) —0.12 to 0.22} 


Sa 





Exercise 12.6 
Objective Questions 
1. Fill in the blanks. 

(i) _ Statistical ————— is the conclusion made about the unknown 

value of population parameter by using the sample observations. _ (inference) 
(ii) | The statistical ————— is a procedure of making judgment 

about the unknown value of population parameters by using the _(¢Stimation) 

sample observations. 


(iii) | The object of ————— estimation is to obtain a single number 
from the sample that is intended for estimating the unknown 3 
true value of a population parameter. (point) 


(iv) A point estimator is a ————— variable whereas an estimate is 
a constant. (random) 


(v) An estimator is ————— if its expected value is equal to the 
population parameter to be estimated.: (unbiased) 





Estimation 





109 
(vi) If T isa biased estimator, then ————— is the difference of its 
expected value from the parameter @ to be estimated. (bias) 
(vii) The sample mean X is an ———— estimator of population , 
mean J. (unbiased) 
(viii) The sample proportion P is an ————— estimator of 
population proportion 1. (unbiased) 
(ix) The sample variance S? = ¥(X - X)?/(n- 1) is an 
estimator of population variance o7. (unbiased) 
(x) estimation is a procedure of constructing an interval 
from a random sample, such that prior to sampling, it has a high 
specified probability of including the unknown true value of a 
population parameter. (Interval) 
2. Fill in the blanks. 
(i) The width of a confidence interval is —————- if the level of 
confidence (1 — @ ) is decreased. (decgeased) 
(ii) |The width of a confidence interval is ————— related to 
confidence coefficient. (directly) 
(iii) | The precision of confidence interval is increased by 
the level of confidence. (decreasing) 
(iv) The width of a confidence interval ————— if the sample 
size 1s increased. (decreases) 
(v) The confidence coefficient is also called ————— of 
confidence, (level) 
(vi) 1 — q@ is the -———— that the interval estimator includes 
the unknown true value of the population parameter. (probability) 
(vii) A sample consisting of 30 or less observations is known as 
a —————- sample. (small) 
(viii) A sample consisting of more than 30 observations is known 
as a —————— sample. | (large) 
sh Mark off the following statements as true or false 
(i) The types of statistical inferences are estimation of 
parameters and testing of hypotheses. (true)l 
(ii) |The types of statistical estimation of parameters are point 
estimation and interval estimation. (true) 
(iii) | A point estimator is a sample statistic that is used to estimate 
the unknown true value of a population parameter. (true) 
(iv) A point estimate is a specific value of an estimator computed 
from the sample data after the sample has been observed. (true) 
(v) An estimate obtained from the sais. observations is always 


a point estimate. 


(false) 


ce waa ge ek as Ris ee ee EE SEITEN DO = 
‘ 
f 
i 


— — _ 
41 4 COO 





Statistics — Part II 


110 a 

(vi) Point estimators may be more useful than interval estimators 
because probability statements are attached to point 
estimates. 

(vii) The point estimation. provides two values with a probability 

_ Statement for estimating the unknown true value of a 
population parameter. 

(viii) A confidence interval is a type of statistical inference. 

(ix) A point estimate provides information about the precision of 
the estimate. 

4, Mark off the following statements as true or false 

(7) We cannot control the precision of an interval estimate by the 
choice of sample size or level of confidence. 

(ii) The width of a confidence interval increases if the 
confidence coefficient is decreased. 

(iii) The width of a confidence interval decreases if the 
confidence coefficient is decreased. 

(iv) The width of a confidence interval can be decreased by 
decreasing the confidence coefficient. 

(v) The precision of an interval estimate can be increased by 
decreasing the sample size. 

(vi) The precision of an interval estimate can be increased either 
by increasing the sample size or by decreasing the confidence 
coefficient. 

(vii) a is the probability that the interval estimator includes the 
unknown true value of the population parameter. 

(viii) The statistic 7 can be used in making confidence interval 


for 2 when population is non-normal. 


(false) 


(false) 


(true) 


(false) 


- (false) 


(false) 
(true) 
(true) 


(false) 


(true) 
(false) 


(false) 


HYPOTHESIS 
TESTING 





13.1 THE ELEMENTS OF A TEST OF HYPOTHESIS 


We are often concerned with testing or drawing a conclusion about the population 
parameter @ based on a simple random sample data. In this chapter we present formal structures 
for making inferences about population parameters such as pf, 1, o*, etc., we will begin by 
introducing a fest of hypothesis. 


13.1.1 Statistical Hypothesis. A statistical hypothesis is an assertion or conjecture about the 
distribution of one or more random variables. It is an assertion about the nature of a population. 
This assertion may or may not be true. Its validity is tested on the basis of an observed random 
sample. Hypotheses are usually phrased quantitatively in terms of population parameters. The 
following are some examples of hypotheses. ~ 


(i) fiiv= 25 ( A population mean equals 25 ) 

(ii) gw < 40 ( A population mean is less than 40 ) 

(iii) po 2 20 ( A population mean is greater than or equal to 20 ) 
(iv) nm = 0.4 ( A population proportion equals 0.4 ). 


13.1.2 Hypothesis Testing. The statistical hypothesis testing is a procedure to determine 
whether or not an assumption about some parameter of a population is supported by the 
information obtained from the observed random sample. 


13.1.3 Specification of the Form of Population Distribution. The experimenter, or 
researcher, must make an assumption about the nature of the underlying distribution of the 
population. Is the random variable normal or binomial. Or does it follow any other form of the, 
distribution. It is, therefore, necessary to identify the theoretical probability distribution of the 
random variable under consideration because the decision about the hypothesis is made on the 
basis of probability of occurrence. 


We will find that there are certain elements common to all tests of hypotheses. These 
elements are introduced and discussed below: 


13.1.4 Null Hypothesis. A null hypothesis, denoted by Ho, is that hypothesis which is tested for 
possible rejection (or nullification) under the assumption that it is true. 


A null hypothesis is always a statement of either (1) “no effect’ or(2) “the status 
quo”, such as the given coin is unbiased, or a drug is ineffective in curing a particular disease, or 
there is no difference between the two production methods, or the production line requires no 
‘preventive maintenance, efc, The experiment is conducted to see if this hypothesis is 
unreasonable. 


111 


13.1.5 Establishment of the Null Hypothesis. Let @ represent the true but unknown value of 
the population parameter and 8, a value on the number line, the hypothesis to be tested will take 
on one of the following three forms. 


(i)  @ = @5, that is, the true value of the population parameter is equal to some specified 
value 0). 


(ij) @ 26@,, that is, the true value of the population parameter is equal to or greater than 
some specified value @). 


(iii) @ S@o, that is, the true value of the population parameter is equal to or less than 
some specified value 0). 


13.1.6 Alternative Hypothesis. An alternative hypothesis, denoted by H,, is that hypothesis 
which we are willing to accept when the null hypothesis is rejected. 

An alternative hypothesis gives the opposing conjecture to that given in the null 
hypothesis. The alternative hypothesis is often called the research hypothesis, because this 


hypothesis expresses the theory that the experimenter, or researcher, believes to be true. The 
experiment is conducted to see if the alternative hypothesis is supported 


13.1.7 Formulation of Null and Alternative Hypothesis. The alternative ( or research ) 
hypothesis is a statement about the value of a population parameter that an investigator attempts 
to support with observed random sample. The statistical hypothesis testing makes use of the null 
hypothesis that refers to the same population parameter but denies the alternative hypothesis. 
Thus the basic strategy in statistical hypothesis testing is to attempt to support the research 
hypothesis by contradicting the null hypothesis. Therefore, when choosing the null and alternative 
hypotheses, take the following steps: 

(t) | The experiment is conducted to see if there is support for some hypothesis. This will be 
the alternative hypothesis, expressed as an inequality in the form “less than” or “greeter 
than” or “not equal to”. 

Example: H,: 8 < 40 

(ii) State the null hypothesis with an equality sign as a complement of the alternative 
hypothesis. : 
Example: H,: @ 2 40 


The following table presents the three types of alternative hypotheses that constitute the 
counterparts to the three types of null hypotheses. 
Null hypothesis H, Alternative hypothesis H, 
@ <6) 
2. 6 <4) @ >) 
@ #0, (i.e, 9 <6) or 6 > 8) ) 














Example 13.1 Formulate the null and the alternative hypotheses used in test of hypothesis for 
each of the following: 
(i) The mean lifetime of electric light bulbs newly manufactured by a company has not 
changed from the previous mean lifetime of 1200 hours. 


Hypothesis Testing 113 


(ii) | An automobile is driven on the average no more than 16000 kilometers per year. 
(iii) Atleast 10% of the people of Pakistan pay income tax. 


(iv) The proportion of the households that do not own a colour television set is more than 
0.40 in a locality. 


(v) The average yield of corn of variety A exceeds the average yield of variety B by at 
least 200 kilogram per acre. 


Solution. The null hypothesis H, and the alternative hypothesis H, for each of the given 
situations are: | 


(i) H,:, H = 1200 hours against H,: uw # 1200 hours 

(ii) H,:  < 16000 kilometers against H,: & > 16000 kilometers 
(iii) Hy: x = 0.10 against H,: % < 0.10 

(iv) H,: tm <= 0.40 against H,: ™ > 0.40 

(v) Hy: Wj - Hh, 2 200kg | against H,: U,- H, < 200kg 


13.1.8 Simple Hypothesis. A statistical hypothesis is said to be a simple hypothesis if it 
completely specifies the underlying population distribution, namely 


(i) The functional form of the distribution, and 
(ii) |The specific values of all of its parameters. 


That is, if it specifies the particular member of -the particular family of probability 
distributions, e. g., a random variable has a normal distribution with mean pp = 30 and standard 
deviation o = 4; or arandom variable has a binomial distribution with n = 6 and x = 0.4, 


13.1.9 Composite Hypothesis. A statistical hypothesis is said to be a composite hypothesis if 
it does not completely specify the underlying population distribution, e. g., a random variable has 
a normal distribution with mean = 40 or a random variable has a normal distribution with 
mean p = 50 and standard deviation o 2 4; or random variable has a distribution with mean 
ft = 40 and standard deviation o = 5; ora random variable has a binomial distribution with 
m= 0.4. | 


13.1.10 Test Statistic. The fest statistic is a sample statistic which provides a basis for deciding 
whether or not the null hypothesis should be rejected. The most commonly used test statistics are 
ZT; x and F, 


.13.1.11 Rejection Region. A rejection region specifies a set of values of the test statistic for 
which the null hypothesis is rejected (and for which the alternative hypothesis i is accepted). Itis 
also called as the critical region. 


13.1.12 Nonrejection Region. A nonrejection region specifies a set Of. values of the test 
statistic for which the null hypothesis is not rejected. It is also called as the noncritical region. 


13.1.13 Critical Values. The values of the test statistic which separate the rejection and 
nonrejection regions for the test are called critical values. 


112 ans Statistics — Part I 


13.1.5 Establishment of the Null Hypothesis. Let @ represent the true but unknown value of 
the population parameter and @> a value on the number line, the hypothesis to be tested will take 


on one of the following three forms. 


(i) @ = @,, that is, the true value of the population parameter is equal to some specified 
value 6). 


(ii) @ 2 @,, that is, the true value of the population parameter is equal to or greater than 
some specified value @,. 


(iii) @ = 6,, that is, the true value of the population parameter ts equal to or less than 
some specified value @). 


13.1.6 Alternative Hypothesis. An alternative hypothesis, denoted by H_,, is that hypothesis 
which we are willing to accept when the null hypothesis is rejected. 

An alternative hypothesis gives the opposing conjecture to that given in the null 
hypothesis. The alternative hypothesis is often called the research hypothesis, because this 
hypothesis expresses the theory that the experimenter, or researcher, believes to be true. The 
experiment is conducted to see if the alternative hypothesis is supported 


13.1.7 Formulation of Null and Alternative Hypothesis. The alternative ( or research ) 
hypothesis is a statement about the value of a population parameter that an investigator attempts 
to support with observed random sample. The statistical hypothesis testing makes use of the null 
hypothesis that refers to the same population parameter but denies the alternative hypothesis. 
Thus the basic strategy in statistical hypothesis testing is to attempt to support the research 
hypothesis by contradicting the null hypothesis. Therefore, when choosing the null and alternative 
hypotheses, take the following steps: 
(i) The experiment is conducted to see if there is support for some hypothesis. This will be 
the alternative hypothesis, expressed as an inequality in the form “less than” or “greeter 
than” or “not equal to”’.. 


Example: H,: @ < 40 
(ii) State the null pests with an equality sign as a complement of the alternative 
hypothesis. 
Example:-H,: @ 2 40 
The following table presents the three types of alternative hypotheses that constitute the 
counterparts to the three types of null hypotheses. 


Null hypothesis Hy Alternative hypothesis H, 


0 < 4) 
6>486, 
6 #6, (i.e,80< 6, or @ > @ ) 


mee: 13.1 Formulate the null and the alternative hypotheses used in test of ippotheses for 
each of the following: 
(t) The mean lifetime of electric light bulbs newly manufactured by a company has not 
changed from the previous mean lifetime of 1200 hours. 





Hypothesis Testing : 113 


(ii) | An automobile is driven on the average no more than 16000 kilometers per year. 
(ii) Atleast 10% of the people of Pakistan pay income tax. 
(iv) The proportion of the households that do not own a colour television set is more than 
0.40 ina locality, 
(v) The average yield of corn of variety A_ exceeds the average yield of variety B by at 
least 200 kilogram per acre. 
Solution. The null hypothesis H, and the alternative hypothesis H, for each of the given 
situations are: 


(i) Hy: H = 1200 hours against H,: 2 # 1200 hours 

(ii) Hy:  < 16000 kilometers against _ H,: « > 16000 kilometers 
(iii) Hy: x = 0.10 against H,: % < 0.10 

(jv) Hy: & < 0.40 against H,: x > 0.40 

(\) Ho: y- th 2 200kg against Hy: [ty - Hy, < 200kg 


13.1.8 Simple Hypothesis. A statistical hypothesis is said to be a simple hypothesis if it 
completely specifies the underlying population distribution, namely 
(1) The functional form of the distribution, and 
(ii) The specific values of all of its parameters. 
| That is, if it specifies the particular member of -the particular family of probability 
distributions, e. g., a random variable has a normal distribution with mean yf = 30 and standard 
deviation o = 4: orarandom variable has a binomial distribution with n = 6 and tm = 0.4. 


13.1.9 Composite Hypothesis. A statistical hypothesis is said to be a composite hypothesis if 
it does not completely specify the underlying population distribution, e. g., a random variable has 


. ‘ anormal distribution with mean = 40 or a random variable has a normal distribution with 


mean # = 5Q and standard deviation o 2 4; or random variable has a distribution with mean 
: = 40 and standard deviation o = 5; or a random variable has a binomial distribution with . 
= ().4. 


13.1.10 Test Statistic. The test statistic is a sample statistic which provides a basis for deciding 
whether or not the null hypothesis should be rejected. The most commonly used test Statistics are 
Z, T, x and F. . 


-13.1.11 Rejection Region. A rejection region specifies a set of values of the test statistic for 
which the null hypothesis is rejected (and for which the alternative hypothesis is accepted). It is 
also called as the critical region. 


13.1.12 Nonrejection Region. A nonrejection region specifies a set of values of the test 
statistic for which the null hypothesis is not rejected. It is also called as the noncritical region. 


13.1.13 Critical Values. The values of the test statistic which separate the rejection and 
nonrejection regions for the test are called critical values. 





114 | | | oa. Statistics — Part I 


13.1.14 Two-tailed Test. If the 
critical region is located equally in both 
tails of the sampling distribution of test Rejection 
statistic, the test is called a two-tailed or Region 
two-sided test. In such tests, the analyst 
is concerned with detecting values of | 
the test statistic that are either too large : 
or too small to be consistent with the 


hypothesis being tested. Fig: 13.1 Reject Hy if Z< z4/, or Z> %_g/2 






Rejection 
Region 





l-@ 
Acceptance 
Region 







With a two-tailed test, acceptance of null hypothesis means acceptance of a unique value 
for the population parameter. 
13.1.15 One-tailed Test. If the critical region is located in only one tail of the sampling 
distribution of test statistic, the test is called a one-tailed or one-sided test. 


13.1.16 Left-tailed Test. If the 
critical region is located in only the left =i 
tail of the sampling distribution of test Rejection 
statistic, the test is called a Jeft-tailed Region 

fest. In such tests, the analyst is 


concemed with detecting values of the . | : 





Acceptance 
Region 





test statistic that are too small to be ane 0 z° 
consistent with the hypothesis being . : 
tested. Fig: 13.2 Reject Hy if Z < z, 


With a one-tailed test, acceptance of null hypothesis means accepting that the population 
parameter is one of many acceptable values. 


13.1.17 Right-Tailed Test. If the 
critical region is located in only the 
right tail of the sampling distribution of . Acceptance 
test statistic, the test is called a right- Region 
failed test. In such tests the analyst is 
concemed with detecting the values of 





Rejection 
Region 





the test statistic that are too large to be 0 ae 2S z 
consistent with the hypothesis being 
tested. phys | Fig: 13.3 Reject Hy if Z > z,_, 


13.1. 18 Deciding upon ¢ an Ararooriate Test. Now the question arises, that how we should 
decide upon an appropriate test. While e ecuding upon an appropriate test the following hints are 


helpful: - 
() ~—«sdf we are looking for a definite decrease, i. e., if H, is givenby @ < @, we use a one- 
sided left tail test. 
(i) ‘If we are looking for a definite 1 increase, i. €., if Hy is given by @ > @, we use a one- 
“sided right tail test. 


Hypothesis Testing | 115 


(iii) If we are looking for any change, 1. e., if H, is given by @ # @, we use a two-sided 
test. 
The preceding discussion describing the nature of the one-tailed and two-tailed tests is 
summarized in the following table: 


Table: Describing the nature of test. 


Null Alternative pn hae, 
. ~ Type of test Location of rejection region 
hypothesis hypothesis YPS J g} 
6 26) @ <6,  Left-tailed The rejection region is located in the left tail of the 


sampling distribution under H). 


8 <86, @ >86,  Right-tailed The critical region is located in the right tail of the 
sampling distribution under H,. 


ba] 


G=80, @ #6,  Two-tailed The rejection region is located equally in both tails 
of the sampling distribution under Hp. 


Remarks : The names of the three types of a test are associated with alternative hypothesis H,. 
The choice of a one-sided left tail test, one-sided right tail test or a two-sided test 
depends upon the type of alternative hypothesis. 
13.1.19 Errors of Inference. Sample evidence is used to test the null hypothesis H). If the 
sample evidence convinces us that H, has only a small chance of being true (a large chance of 
being false ), we say H, is unreasonable and reject it. If the sample evidence does not convince 
us that H, is unreasonable, we do not reject H). Therefore, there are two alternative 
conclusions. ; 
(i) Reject the null hypothesis on the basis of sample evidence. 
(ii) Accept the null hypothesis on the basis of sample evidence. 
There are two possible states of nature of the null hypothesis. 
(i) The null hypothesis is true. 
(ii) The null hypothesis is false. 
Thus, in hypothesis testing one and only one of the following four possible outcomes will occur. 
(i) If the null hypothesis is true, we may reject it leading to a wrong decision. 
(ii) If the null hypothesis is true, we may accept or not reject it leading to a correct decision. 
(iii) If the null hypothesis is false, we may accept it leading to a wrong decision. 
(iv) _ If the null hypothesis is false, we may reject it leading to a correct decision. 
Type-I Error, A Type-lerror is made by rejecting H, if Hp is actually true. 
Type-H Error, A Type-Il error is made by accepting Hy if H, is actually true. 


These four possible outcomes can be displayed in four cells of a table as follows: 





116 : Statistics — Part 





Table: Hypothesis and conclusion reached from sample 


_State of nature of hypotheses 


Reject Hy Type-I error Correct decision 
Do not reject (or accept) H) Correct decision Type-II error 


The probabilities of the four possible outcomes are commonly designated as 










a = P( Rejecting H, |Ho istrue) = P( Type-I crror ) 
1 -— a = P( Accepting Hy | Ho is true ) 


P( Accepting Ho|H, is true) = P( Type-Il error ) 


wD 
if 


1- 6 = P( Rejecting H,|H, is true ) 
We can represent these conditional probabilities as: 


Table: Conditional probabilities 


State of nature of hypotheses 
Sampleinicesconchsion [Mise [iste 
we a 


13.1.20 Level of Significance. The level of significance of a test is the maximum probability 
with which we are willing to a risk of Type-I error. It is the probability of obtaining a value of 
the test statistic inside the critical region given that H, is true, i. e., it is the probability of 


rejecting a true null hypothesis. It is denoted by a. 













Do not reject (oraccept) Hp 





P( Rejecting H,|H, istrue) = P(Type-l error) = @ - 


The level of significance is also called as the size of the critical region or the size of the 


test. It is a small pre-assigned value, say, 0.05 or 0.01, which is generally specified before any 


samples are drawn, so that the results obtained will not influence our choice. 


13.1.21 Level of Confidence. The level of confidence is the probability of accepting a true 
null hypothesis. It isdenoted by 1 — a. 


- P( Accepting Ho|H, istrue) = 1 = & 


Example 13.2 The prosecuting attorney in a trial attempts to show. that the defendant is guilty. 
The trial can be thought of a test of hypothesis, since a decision is to be made as to whether the 
defendant is guilty or innocent. - : 

(a) State the null and alternative hypothesis of interest to the prosecuting attorney. — 

(b) Define the Type-\ error and Type-U error for this situation. 


TELL REESE 
_ 


Hypothesis Testing : 117 


Solution. Since the prosecuting attorney wants to show that the defendant is guilty, this specifies 
the alternative hypothesis and the hypotheses are 


H,,: Defendant is innocent _H,: Defendant is guilty 


Type-I error would occur if the defendant were found guilty if, infect, the defendant is innocent. 
Type-II error would occur if the defendant were found innocent if, infect, the defendant is guilty. 


13.1.22 Statistical Decision Rule. A statistical decision rule specifies, for each possible sample 
outcome, whether the null hypothesis should be rejected or not. It is also called as a decision 
function or a decision criterion. 


13.1.23 Conclusion. A test of hypothesis leads to one of two conclusions. 
(i) If the observed value of the test statistic falls in the rejection region, the null hypothesis 
H, is rejected in favour of alternative hypothesis H,. 


(ii) If the observed value of the test statistic does not fall in the rejection region the null. 
hypothesis is neither accepted nor rejected. It is, then, stated that there is insufficient 
evidence to make a decision. 


13.1.24 Test of significance. The test of significance describes a process of testing a hypothesis 
to gain a general impression about the parameter. No decision is imminent or even implied. In 
this case, a level of significance is not present. Instead we investigate the values of @ that would 
lead to rejection of Hy as opposed to the a values leading to acceptance of H). The test of 


significance differs with hypothesis testing that refers to the act of actually testing a hypothesis at 
a selected level of significance to make a decision based on that conclusion. 


13.1.25 Steps to Follow when Testing a Hypothesis. Since we will be conducting many tests 
of hypothesis, it is useful to follow a set procedure. In the remainder of this text, we will always 
follow the same format. The steps we will follow are given below: 


Before any sample observations are considered: 


1. Identify the population of interest and state the conditions required for the validity of the 
test procedure being used. 


2. Formulate and state the null hypothesis H and the alternative ee A,. 


3. Decide and specify the level of significance, a. 

4. Select the appropriate test statistic and its sampling distribution if H) is assumed to be 
true. 

5. Give the critical value (or values) of test statistic for desired value of «. 

6. Establish the rejection (critical) region. 

7. State the decision rule: Reject H, if the value of the test statistic from the observed 
sample falls in the rejection region, otherwise do not reject.( or accept ) Hp. 
Now consider the sample values: , 

8. Calculate the value of the test statistic from the observe sample. 


9. In the light of your decision rule, draw the conclusion as to whether to reject or accept 
H, and then state the decision in managerial terms. | 


118 _ Statistics — Part I 


Exercise 13.1 | 


1. (a) Define the following concepts in your own words as fully as you can: 


(i) Hypothesis testing (ii) Statistical hypothesis 
(iii) Null hypothesis (iv) Test statistic 
(vy) Level of significance (vi) Critical region 


(6) Explain with examples the difference between: 
(i) Estimation and hypothesis testing 
(ii) | Null hypothesis and alternative hypothesis 
(iii) Simple hypothesis and composite hypothesis 
(iv) Acceptance region and rejection region 
(v) Type-I error and Type-Il error 
(vi) One-tailed test and two-tailed test 
(c) What is the difference between a one-sided and a two-sided test? When should each be 


used? 

2. (a) Explain what is meant by: 
(i) Statistical hypothesis (ii) Test-statistic 
(iit) Significance level (iv) Test of significance 


(6) Distinguish between the following concepts: 
(i) Statistical estimation and hypothesis testing 
(ii) Rejection and non-rejection regions 
(iii) Atestat a level ofsignificance and atl - 0 confidence level 
3. (a) Explain how the null hypothesis and the alternative hypothesis are formulated. 
(6) State the null and alternative hypotheses to be used in testing the following claims: 
(i) The mean rainfall at Lake Placid during the month of June is 2.8 inches. 
(ii) No more than 20% of the faculty at Highland University contributed to the 
annual giving fund. 
(iii) The proportion of voters favouring Senator Foghorn in up coming election 
is 0.58. 


(iv) On the average children attend school within 3.8 miles of their homes in 
: suburban San Francisco. 
{(@ Hy: # = 2.8 inches against H,: # # 2.8 inches; (ii) Hy: % S 0.20 against 


H,: ™ > 0.20; (iii) Hy: m = 0.58 against H,:m # 0.58; (iv) Ho: US 3.8 miles 

against H,: 2 > 3.8 miles.} : 
4. (a) Indicate Type-I and Type-II errors in the following statements: 

(tf) An innocent driver may be held by a traffic constable. 


Hypothesis Testing } | 119 


(ii) A judge can acquit a guilty person. 
(iii) A deserving player may not be selected in the team. 
(iv) A bad student may be passed by the examiner. 

{ (i) Type-I, (ii) Type-II, (iit) Type-I, (iv) Type-Il.). 


(6) For each of the following situations, indicate the Type-I and Type-II errors and the 
correct decisions, 


(i) H,: New system is no better than the old one. 


Adopt new system when new one is better. 
Retain old system when new one Is better. 
Retain old system when new one is not better. 
Adopt new system when new one is not better 
(ii) H,: New product is satisfactory. 

Market new product when unsatisfactory. 
Do not market new product when unsatisfactory. 
Do not market new product when satisfactory. 
Market new product when satisfactory. . | 

{ (i) Correct, Type-Il, Correct, Type-I; (ii) Type-Il, Correct, Type-I, Correct ) 


(c) Suppose that a psychological testing service is asked to check whether an ‘executive is 
emotionally fit to assume the presidency of a large company. What type of error is 
committed if the hypothesis that he is fit for the job is erroneously accepted? What type 
of error is committed if the hypothesis that he is fit for the job is eoneen rejected 2 
(Type-Il error, Type-I error ) 


5. (a) The director of an advertising agency Is concemed with the effectiveness of a certain 
kind of television commercial. 


(i) | What hypothesis is he testing, if he is committing a Type-I error when he says 
erroneously that the commercial is effective. 


(ii) | What hypothesis is he testing, if he is committing a Type- -II error when he says 
erroneously that the commercial is effective. 
{ (¢) Commercial is not effective; (ii) Commercial is effective } 


(5) Outline the fundamental procedure followed in testing a null hypothesis. ! 


120 | : ; Statistics — Part If 


13.2 TEST OF HYPOTHESIS ABOUT 
A POPULATION MEAN, 1 


13.2.1 Forms of Hypothesis. Let j1, be the hypothesized value. of a population mean. The 
three possible null hypotheses about a population mean, and their corresponding alternative 


hypotheses, are: 


LS gs = Ho against Hi: HW < Uo 
Piguet Lo against Hy? fl > Uo 
3. Ho: BH = Ho against Hi: MF Uo 


In each case, the test will be made by obtaining a simple random sample of size n and computing 
the sample mean X. Then X will be used in computing a test statistic. Depending upon the 
calculated value of the test statistic, Hp ‘will be accepted or rejected, 


13.2.2 Normal Population, o? known. We know that for a normal population having a mean 
Ho and a known variance o? the sampling distribution of X is normal with mean , and a 
standard error of ox = o/ /n . Then the statistic 
Ni X = Uo a X — My 

9% of fn 


has the standard normal distribution. Consequently, the statistic Z is the test statistic for testing a 
hypothesis about the mean of a normal population whose variance o? is known. 


13.2.3 Normal Population, o* unknown. If the variance of the population o” is unknown, 
then we estimate o7 by the sample variance S? and we estimate the standard error of X by S-, 


where S> = Oz S/ jn . If the population is normal, the statistic 


>| 


2 S/fn 


has za student’s (-distributin with v = n — | degrees of freedom. Consequently the statistic 7 


ia 


is the test statistic for testing a hypotheses about the mean of a normal population whose variance 


o? is unknown. 


13.2.4 Any Population, o? known/unknown. Finally recall the Central Limit Theorem 
which states that the sampling distribution of X is approximately normal even for non-normal 
populations if the sample size is sufficiently large (say, n > 30). Then we will use the test 
statistic 

eae X= iy. X—My 


Ox ofjn — S/fn 










_ Hypothesis Testing | 121 


to test a hypothesis about the mean of any population ( normal or not ). 


The summary of the procedure of testing a hypothesis about the mean of a population whose 
variance o” is known/unknown, at a given significance level , for the three respective 
alternative hypotheses is shown in the following table. 


Testing mean of a population, 0” known/unknown: 
Summary of procedure for 3 alternatives 


Choose alternative hypothesis | 





Reject H, if Reject H, if Reject Hy if 











Z<%, Z>Z_¢ Z < 22 OF 








We reject H, if We reject Hy if We reject H, if 







Zi<aze Zee ZS 2qyo OF 
otherwise we do otherwise we do Z> a2 
not reject H, not reject H, : 
brs (008 otherwise we do 





not reject H) 





ee ee ee ee SebwatiasAtiaace ia 7 



















122 Statistics — Part Il 


The summary of the procedure of testing a hypothesis about the mean of a normal population 
whose variance o? is unknown, at a given significance level a, for the three respective 
alternative hypotheses is shown in the following table. 


Testing mean of a normal population, 0? unknown: 
Summary of procedure for 3 alternatives 






Choose alternative hypothesis 








Set level of significance: a 








Reject Hy if Reject Hy if Reject H, if 


T< foie 






IO Se T <b. q/2 OF 


n— | 


T > ty.1-o/2 








We reject H, We reject H, if We reject H, if 











PE tk, ee t<ty.gjo OF 
otherwise we do otherwise we do fle ea/2 
not reject H, not reject H, ee: 


otherwise we do 
not reject Hy 


Choice of Test Statistic for Testing Population Mean. The appropriate test statistic for testing a 
mean is either-the Z or the T7 statistic, and depends upon the sample size and whether the 





Lh Sle ae ee 


et) 


Hypothesis Testing | 123 


population distribution and population variance are known. Thus in testing the hypothesis about 
the population mean we will use: 
(i) The test statistic Z when the population is normal and variance is known, or the sample 
size is sufficiently large, 
(ii) |The test statistic 7 when the population is normal, whose variance is unknown and 
sample size is small (nm S 30). 
This can be summarized as follows. 


Population Population Variance | Sample Size Appropriate Test Statistic 

Distribution o* n 

Normal Known. Any Z 

Normal Unknown Any T (If n > 30, can also use 
Z as an approximation ) 

Any Known or unknown n > 30 zZ 


where, when co? is unknown, its estimate is the sample variance S$”. 


Example 13.3 A Bureau of Research statistician is designing an experiment to test 
achievement scores of first year students in Government College Gujranwala on a standardized 
test, scored one through one hundred. He is willing to assume that the distribution of scores ts 
normal. He also believes that the true achievement of all first years students in Government 
College, Gujranwala is greater than 58. If he has a sample of 36 such scores what hypothesis 
should he test. ) 


Solution. Since he believes pp > 58, he should test Hy: u S 58 against H,: uw > 58. He 
hopes to reject H,. A rejection of Hp is a positive action due to the overwhelming evidence that 
pL is not less than or equal to 58. An acceptance of H, is a negative action because it implies 


that there is not sufficient evidence to reject Ho. 

Example 13.4 If the null hypothesis is Hy: & S 50, when would a Type-II error be made. 
Solution. Wheninfact u > 50 and H, 1s accepted. 

Example 13.5 If the null hypothesis is Hj: L 2 100, when would a Type-II error be made. 
Solution. When infact 4 < 100 and Hy is accepted. 


Example 13.6 A company claims that the average amount of coffee it supplies in jars is 
6.0 oz with a standard deviation of 0.2 oz. A random sample of 100 jars is selected and average 
is found to be 5.9. Is the company cheating the customers? Use 5% level of significance. 


Solution. We have n = 100, x = 5.9, o = 0.2 

The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the average amount jp is less than 6.0 ounces. Consequently, we will test the null hypothesis 
that 2 = 6.0 against the alternative hypothesis that pt < 6.0. 

The elements of the one-sided left tail test of hypothesis are: 

Null hypothesis Hy): WH 2 6.0 


Alternative hypothesis H,: Lh < 6.0 
Leyel of significance: a@ = 0.05 


eg ep i ee ed ee 








124 | Statistics — Part Il 
Test statistic: 7 = Aho follows a standard normal distribution 
o/ Jn under H) 
Critical values: Za = 295 = — 1-645 { From Table 10 (b) } 
Critical region: Z < -— 1.645 
Decision rule: Reject H, if Z < — 1.645, otherwise do not reject H) 
Observed value: = 27, =, ee 5.0 
olin 02/100 
Conclusion; Since z = —5 < —1.645, we reject H, and conclude that the average 


amount of coffee is less than 6.0 ounces. 
Example 13.7 A company makes parachutes. The company has been buying snap links from a 
manufacturing firm. The company is concerned that the quality of snap links they receive from the 
firm might not be up to specifications. Specifically, the company wants to be convinced that snap 
links will withstand a mean breaking strength of more than 5000 pounds. Perform a test of 
hypothesis at the 0.005 significance level if the mean breaking strength for a random sample of 
50 snap links is 5100 pounds. The population standard deviation is 221 pounds. What ts 
implied by the test result. 
Solution. We have n =~50. x = 5100, o = 221 
The objective of the sampling is to attempt to support the research ( alternative ) hypothesis that 
the mean breaking strength jz is more than 5000 pounds. Consequently, we will test the null 
hypothesis that pp < 5000 against the alternative hypothesis that pp > 5000. 
The elements of the one-sided right tail test of hypothesis are: 
léitll hypothesis H): BH = 5000 


Alternative hypothesis H,: pf > 5000 





Level of significance: a = 0008 => 1-a@ = 1 - 0.005 = 0.995 

Test statistic: Fi Wess X = Ho follows a standard normal distribution 
; o / yn under H, . 

Critical value: Zea = %o5 = 2.576 { From Table 10 (d) } 

Critical region: Z > 2.576 

Decision rule: Reject H, if Z > 2.576, otherwise do not reject Hp. 

| x= 100 — | 

Observed value: Bote O02 5000 f= 320 


) oftn 221/50 

Conclusion: Since z = 3.20 > 2.576, we reject H, and conclude that the mean 
breaking strength is more than 5000 pounds. Links from the firm are 
up to specifications. Buy them. 

‘Excmple 13.8 The mean lifetime of electric light bulbs produced by a company has in the past 

been 1120 hours. A random sample of 36 electric bulbs recently chosen from a supply of newly 

manufactured bulbs showed a mean lifetime of 1087 hours with a standard deviation of 120 


Hypothesis Testing i — 125 





hours. Test the hypothesis that the mean fete of light bulbs has not changed using a level of 


significance of 0.05. 

Solution. We have n = 36, uw LOST $s, =—120 

The objective of the sampling is to attempt to support the research ( alternative ) hypothesis that 
the mean lifetime js has changed from 1120 hours. Consequently, we will test the null 
hypothesis that 4 = 1120 against the alternative hypothesis that wp # 1120. 

The elements of the two-sided test of hypothesis are: 

Null hypothesis Hy: H = 1120 


Alternative hypothesis H,: wh # 1120 
Level of significance: a0 = 005 => a/2= 0.025 = 1-a/2 = 0.975 


Test statistic: X= sSucaike” follows an approximate standard 
= Sffins jn normal distribution under H,. 
Critical values: Zg/2 = %o25 = - 1.960, 
2-0/2 = 20975 = 1.960 { From Table 10 (6) } 
Critical region: Z < —1.960 or Z > 1.960 
Decision rule: Reject H, if Z < — 1.960 or Z > 1.960, otherwise do not reject Ho. 


X-—HMo _ 1087-1120 
y s/n = a20/4/COmm 


Conclusion: Since — 1.960 < z = — 1.65 < 1.960, we do not reject H, and conclude that 
the mean lifetime is 1120 hours. 

Example 13.9 We wish to test the hypothesis that the mean weight of a population of people is 
140 Ib. Using o = 15 lb. a = 0.05 anda sample of 36 people, find. the values of X which 
would lead to rejection of the hypothesis. 

Solution, We have n= 36, =o: 

The objective of the sampling is to attempt to: Ska the research ( alternative ) hypothesis that 
the mean weight p is not equal to 140 pounds. Consequently, we will test the null hypothesis 
that u = 140 against the alternative hypothesis that u + 140. , 

The elements of the two-sided test of hypothesis are: 

Null hypothesis Hy): H = 140 


Observed value: = — 1,65 


Alternative hypothesis H,: w # 140 . 
Level of significance: a = 0.05 = af/2 = 0.025 => 1-a/2 = 0975 
Wa 


Test statistic: | Z sstratHtor follows a standard normal distribution 
‘ Oo / jn under H 0 
Critical values: Za/2 = 20025 = — 1.960, 
Z)-o/2 = £9975 — 1.960 | { From Table 10 (db) } 


Critical region: Z < —1.960 or Z > 1.960 





4 
i] 
| 
| 
j 
| 
| 
| 





126 Statistics — Part I 





Decision rule: Reject Hy if Z < —1.960 or Z > 1.960 
ATO Or eee > 1960 

15/,/ 36 ie 
X < 135.1 9 or K > 144.9 


Hence, the hypothesis H,: 1 = 140 will be rejected if either X < 135.1 or X > 144.9. 


Example 13.10 An ultrasonic equipment manufacturer uses a component in one of its machines 
that must withstand considerable stress from vibrations while the machine is operating. An all- 
metal component has been available for some years from a supplier. Extensive historical 
experience has shown this component has a mean service life of 1100 hours. The research 
division of the supplier has just developed a modified component constructed from plastic and 
metal, bonded in a special way. The manufacturer wishes to know whether or not the mean 
service life of the modified component ( 1) exceeds the mean service life of 1100 hours for the 
original component. Test the hypothesis at 1% _ level of significance if a sample of - 
n = 36 components yielded X = 1121 and § = 222 hours. 


~ 


Solution. Wehave nn = 36, x = 1121, S =) 222 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis that 
the mean service life 2 exceeds 1100 hours. Consequently, we will test the null hypothesis that 
yt S 1100 against the alternative hypothesis that > 1100. 


The elements of the one-sided right tail test of hypothesis are: 


Null hypothesis Hy: w s 1100 

Alternative hypothesis H,: uu > 1100 

- Level of significance: n= 0,0] Seon — l1-a = 1-001 = 0.99 

Test statistic: Z = ei! follows a approximate standard normal 
xe s/ jn distribution under Hp . 

Critical value: Zing = 299 =. 2-326 { From Table 10 (d) } 

Critical region:  Z > 2.326 

Decision rule: Reject me if Z > 2.326, otherwise do not reject Ho. 


Ob. d value: = = 
Served vaiue 3 3] ate 599 ] ~220/,]36 


Conclusion: Since z = 0.568 < 2.326, so we-do not reject H, and conclude that. 
the mean service life does not exceeds 1100 hours. 


Example 13.11 A fmber company is interested in seeing if the number of board feet per tree 
has decreased since moving to a new location of timber. In the past, the company has an average 
of 93 board feet per tree. The company believes that the production has decreased since 
changing locations, a random sample of 25 trees yields X = 89 with-§ = 20. Assuming the 
normality of the data, test the hypothesis at a 10% - level of siyiiiemns 


= 0.568 


Hypothesis Testing ! | 127 


Solution. We have noe=s25: t= 289: Mo Fl) 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis that 
the mean production p has decreased from 93 board feet. Consequently, we will test the null 
hypothesis that 4 2 93 against the alternative hypothesis that uw < 93. 


The elements of the one-sided left tail test of hypothesis are: 
Null hypothesis Hy: M2 93 

Alternative hypothesis H,: pf < 93 

Level of significance: a = 0.10 





Test statistic: T = Aa Kos follows a t-distribution under 
S/fn H, with 
Degrees of freedom: v=n-1 = 25-1 = 24 
Critical value: tyeg = to4-010 = — 1.318 (From Table 12 ) 
Critical region: T < -—1.318 
Decision rule: Reject Hy if T < — 1.318, otherwise do not reject Ho. 
Observed value: r= ar te! = BLE t — 1.000 
s/{n _-20/./25 
Conclusion: Since ¢ = —1.000 > -—1.318, we do not reject Hy and conclude 


that the mean production has not decreased from 93 board feet. The 
tree company's claim has not been established. 


Example 13.12 Expensive test borings were made in an oil shale area to determine if the mean 
yield of oil per ton of shale rock is greater than 4.5 barrels. Five borings, made at randomly 
selected points in the area, indicated the following number of barrels per ton 4.8, 5.4, 3.9, 4.9, 
5.5. Suppose barrels per ton are normally distributed. Perform a test of hypothesis at the 5% 
level of significance. 


Solution. The mean and standard deviation of the sample are 


4.8 5.4 39 49 5 Seal ye 24 eee 
| x? | 2304 29:16 15.21 2401 30.25 | Dx? = 121.67, 


= 49 












121.67 — 5(4.9)? 
5-1 


fa) 
tl 


0.64 





The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the mean yield jp. is greater than 4.5 barrels. Consequently, we will test the null hypothesis 
that  S 4.5 against the alternative hypothesis that u > 4.5. 








128 a | Aes | Statistics — Part Il 






The elements of the one-sided right tail test of hypothesis are: 
Null hypothesis Hj: US 45 


Alternative hypothesis H,: wh > 45 
Level of significance: a = 0.05 => l—a = 0.95 


Test statistic: T= —— follows a f-distribution under 
S/Jn H, with 

Degrees of freedom: v=n-1l1=5-12=4 

Critical value: Weimg = '4:095 = 2-132 (From Table 12 ) 

Critical region: T > 2.132 

Decision rule: Reject H, if T > 2.132, otherwise do not reject Hy. 


Observed value: f= zeeetl= 03 = SeZene): = 1.40 
$/ Jn 0.64/ 5 
Conclusion: Since t = 1.40 < 2.132, we do not reject H, and conclude that the 
mean yield is not greater than 4.5 barrels.. 


Example 13.13 A cattle rancher has changed the type of feed he uses to fatten his cattle for 
sale. The feed company claims that the new feed will increase the mean weight gain in his cattle 
by more than 100 pounds per steer. Assuming the weight gain of cattle is normally distributed, 
test the hypothesis of the feed company at @ = 0.05. Previously, the mean weight gain per steer 
has been 800 pounds. A random sample of 30 yields a mean weight gain of x = 935 pounds 
with a standard deviation of 85 pounds. 


Solution. We have np = 30! x = 935, s = 85 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the mean weight p is more than 900 (i. e., 800 + 100 ) pounds. Consequently, we will test 
the null hypothesis that 4 Ss 900 against the alternative hypothesis that pz > 900. 


The elements of the one-sided right tail test of hypothesis are: 

Null hypothesis Hy: # Ss 900 

Alternative hypothesis H,: u > 900 3 

Level ORL SRIICANCE: a = 0.05 = 1-a = 0.95 


Test statistic: == ehisea Hg. follows a t-distribution under - 
S/fn H, with 

Degrees of freedom: v=n—-1 = 30-1 = 29 

‘Critical value: tyea = '5:095 = 1.699 ~ (From Table 12) 


Critical region: T > 1.699 


Hypothesis Testing __ iy . 129 


Decision rule: Reject H, if T > 1.699, otherwise do not reject Hy. 
X—Uo _ 935-900 
- afae, ees Aigo 


Conclusion: Since t = 2.26 > 1.699, so we reject H, and conclude that the 


mean weight is more than 900 (i. e., 800 + 100 ) pounds.. The feed 
company’s claim has been established. 


Observed value: 


= 2,26 


Example 13.14 Workers at a production facility are required to assemble a certain part in. 2.3 
minutes in order to meet production criteria. The assembly rate per part is assumed to be 
normally distributed. Six workers are selected at random and timed in assembling a part. The 
assembly times (in minutes) for the six workers as follows. 

2.0 2.4 1.7 1:9 2.8 1.8 


The manager wants to determine if the mean for all workers afers from 2.3. Perform a test of 
hypothesis at the 5% level of significance. 


Solution. The mean and standard deviation of the sample are 






x; 20 — 24 1.7 19)5 2:8 18 | Xx, = 126 
x;—X 1° 03 -04 =02° 10:779)=03)) ie 
(x,-¥) | 0. 0.16 0.04 049 0.09 | X(x,-x)' = 0.88 
x = = 2:1 
$ = BES | a 
6-1 





The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the mean assembly time y differs from 2.3 minutes. Consequently, we will test the null 
hypothesis that 44 = 2.3 against the alternative hypothesis that pp + 2.3. 


The elements of the two-sided test of hypothesis are: 
Null hypothesis Hj: w= 23 
Alternative hypothesis H,: wp # 2,3 


Level of significance: ao = 0.05 = a/2=0.025 = 1 —- a/2 = 0.975 


Test statistic: = X= ele | follows a t-distribution under 
S/n H, with 

Degrees of freedom: v=n—1] ='6-—1 =235 

Critical values: | tyiai2 = 's:0025 = — 2.571, 





| 
| 
| 


. 
a ee 
‘ | a . 


130 


Critical region: 
Decision rule: 


Observed value: 


Conclusion: 


Statistics — Part Il 


T < -—2.571 or T > 2.571 
Reject H, if T < —2.571 or T > 2.571, otherwise do not reject Ho. 


x — *=#Ho- = 2.1- 223 Eee 166 
5 Ghin 0.42/,f6- ) 
Since —2.571-< t = — 1.166 < 2.571, we do not reject H, and 


conclude that the average assembly time is 2.3 minutes. 


Exercise 13.2 


1. (@) Why is the z-test usually inappropriate as a test-statistic when the sample size is small? 


(6) Define 


“Student’s f-statistic’. What are its assumptions? Explain briefly its use and 


importance in statistics. 

Ad, For each of the following, a random sample of size mn is taken from a normal 
distribution with mean ff and variance o?. The sample mean is x . Test the hypothesis 
stated, at ee level of —— Indicated: 


Sample 


(i) 
(ii) 
(iti) 


(iv) _ 


Sample 


(v) 
(vi) 


(vit) - 


(viii) 
{ () 


(ii) 
(iii) 
(iv) 
(v) 


» (i) 


| Hypotheses Level of 
| siznificance 
















He= 15:8 Lu # 15.8 5% 
pb s 26.3 lu > 26.3 5% 
uu < 123.5 uw > 123.5 1% 
u = 4.40 p< 440 | 2% 
Level of 
significance 
= =e u + ee 5% 
ps 99.2 ph > 99.2 5% 
uu = 86.2 ph < 86.2 10% 
. | Loe eee) 1% 
Sinc€é Zy9o5 = — 1:96 < z= - 1.095 < 1.96 = Zz o975, we do not reject 


Hj: = 15.8 against. H,:u # 15.8. 
Since z = 1.845 > 1.645 = Zgo,, we reject Hy: uw S 26.3 in favour of 


H,: pw > 263. 
Since z = 2.5 > 2.326 = Zo, we reject Hy:  < 123.5 in favour of 
H,: > 123.5. | 
Since z = — 2.778 < — 2.054, we reject Hj): @ 2 4.40 in favour of 
Hy: p < 4.40. 
Since Zgm5 = — 1.96 < z = 1.778 < 1.96 = Zog7s5, we do not reject 


Hy: # = 99.2 against H,: w # 99.2. 
Since z = 1.792 > 1.645 = z,.., we reject H,: M <= 99.2 in favour of 
H,: pw > 99.2. 


Hypothesis Testing 131. 


a 


(vii) Since z = — 1.428 < - 1.282 = Zoo, we reject Hy: M = 86.2 in favour of 
Hy: bt < 86.2. 


ee 


(viii) Since zo; = —2.576 < z = —2.5 < 2.576 = Zq 995, we do not reject Ho: 
yt = 7.0 against H,: w # 7.0.} 


—_— 


3. (a) A random sample of size 36 is taken from a normal population with known variance 
o? = 25. If the mean of the sample is xX = 42.6, test the null hypothesis u 2 45 
against the alternative hypothesis  < 45 with a = 0.05 (a is the probability of 
committing Type-I error). 

(Since z = —2.88 < — 1.645 = 25, we reject Hj: wb 2 45 in favour of 
H,:p < 45) 


(b) Ten dry cells were taken from store and voltage tests gave the following 
results: 1.52, 1.53, 1.49, 1.48, 1.47, 1.49, 1.51, 1.50, 1.47, 1.48 volts. The mean 
voltage of the cells when stored was 1.51V. Assuming the population standard deviation 
to remain unchanged at 0.02 V, is there reason to believe that the cells have 


deteriorated. 
(Since z = —2.53 < — 1.645 = 2 5, we reject Hy: W 2 1.51 in favour of 
Ho: p< 1515) 


4. (a) Asample of 400 male students is found to have a mean height of 67.47 inches. Can it 
be regarded as a simple random sample from a large population with mean height 67.39 
with standard deviation of 1.3 inches? 
( Since zy, = — 1.96 < z = 1.23 < 196° = 2 975, we do not reject 


H,: M = 67.39 against H,: uw # 67.39 at a = 0.05) 


(b) The mean lifetime of electric light bulbs produced by a company has in the past been 
1120 hours with a standard deviation of 125 hours. A sample of 8 electric bulbs 
recently chosen from a supply of newly manufactured bulbs showed a mean lifetime of 
1070 hours. Test the hypothesis that mean lifetime of the bulbs has not changed using a 
level of significance of 0.05. 

( Since Zo5 = — 196 < z = —1.13 < 196 = 2%o7;, we do not reject 


H,: # = 1120 against H,:  # 1120) 


5. (a) Suppose that the mean ys of a random variable X is unknown but the variance of X is 
known to be 144. Should we reject the null hypothesis H): = 15 in favour of an 


alternative hypothesis H,: #@ # 15 ata level of significance of a@ = 0.05, ifa 


random sample of 64 observations yields a mean X = 12? What are the 95% - 
confidence limits for fi. — 
-( Since z = —2 < — 1.960 = Zp, we reject H): pw = 15 in favour of 


H,: # # 15; 9.06 < u < 14.94) 


(b) Inasyrup filling factory the mean weight per bottle is claimed to be 2 = 16. A random 
- sample of 100 bottles is taken anda test statistic used is X, the mean weight of the 
sample. For a 0.05 level of significance, find the critical values for the test statistic and 


132 


6. (a) 


(6b) 


7. (a) 


(6) 


8. (a) 


Statistics — Part Il 


formulate the decision rule. Assume the weights to be normally distributed with 
o* = 2.56. 
(Reject H, if X < 15.68 or X > 16.32) 


A random sample of 36 drinks from a soft-drink machine has an ares content 7.6 
ounces with a standard deviation of 0.48 ounces. Test the hypothesis 4 < 7.5 ounces 
against the alternative hypothesis p > 7.5 at the 0.05 level of significance. 
(Since z = 1.25 < 1.645 = 2 5, we do not reject Hy: MS 7.5 against 


Het > 75) 


It is claimed that an automobile is driven on the average at most 20000 kilometres per 
year. To test this claim, a random sample of 100 automobile owners are asked to keep a 
record of the kilometres they travel. Would you agree with claim if the random sample 
showed an average of 21500 kilometres and a standard deviation of 3900 kilometres? 
Use a 0.01 level of significance. 

(Since z = 3.846 > 2.326 = Zo), we reject Hy:  < 20000 in favour of 


H,: > 20000 ) 


A random sample of 100 recorded deaths in the United States during the past year 
showed and average lifespan of 71.8 years with a standard deviation of 8.9 years. 
Does this seem to indicate that the average lifespan today is greater than 70 yeas? Use 
a 0.05 level of significance. 

(Since z = 2.02 > 1.645 = Zoos, we reject Hy: # S 70in favour of H,: uw > 70) 


It is claimed that an automobile is driven on the average no more than 12000 miles per 
year. To test this claim, a random sample of 100 automobile owners are asked to keep'a 
record of the miles they travel. Would you agree with the claim if the random sample 
showed an average of 12500 miles and a standard deviation of 2400 miles? 

(Since z = 2.083 > 1.645, we reject H,:  S 12000 in favour of H,: uw > 12000 


at a = 0.05) 


Given the following information. What is your conclusion in testing each of 
the indicated null hypothesis? Assume the populations are normal. 









Sample |Sample}Sample| Estimate _—__Hyposhears_ level of 
size | mean | of variance H significance 
from —ape : 





. 0.01 
= 0.01 
(iti) 0.02 


significance 


Sample mea Ta =x): Level of 







(iv) 12 > 241 2.5% 
(v) 17 H=40. . p40 5% 
(vi) 6 us 1503 - p> 1503 5% 
(vii) 10 pf > 133.0 - p< 133.0 1% 








Hypothesis Testing 133 


or I 


(5) 


9, (a) 


(5) 


10. (a) 


{(i) Since t = 0.44 < 2.602 = t5.999, we do not reject Hy. 


(ii) Since t = — 1.25 > —2.492 = thy.99,, We do not reject Hp. 
(ii) Since toy.9, = — 2-492 < t = -0.714 < 2492 = th4.999, we do not 
reject H,. 


(iv) Since t = 2.622 > 2.201 = ty.9075, wereject Hy: WS 24.1 in favour of- 


Hy: u> 24,1. 


.(v) Since tyg.g25 = — 2-120 < t = — 1,892 < 2.120 = 6.9975 we do not reject 
H,: pb = 40 against H,: p = 40. | 
(vi) Since ¢ = 2.152 > 2.015 = ts. 995, we reject Hy: pw S 1503 in favour of 
H,:p > 1503. 
(vii) Since t = — 3.073 <-2.821 = ft. 99,, we reject Hy: “2 133.0 in favour of 
H,: # < 133.0 }. 


A random sample of size n is drawn from normal population with mean 5 and 
variance O°. 


(i) If n = 25, X¥ = 3 and § = 2, whatis 2? 
(ii) If n = 9, xX =2 and ¢ =—2> whatis 5s? 
(iii) If n = 25, $ = 10 and ¢ = 2, whatis x 


(iv) If § = 15, ¥ = 14 and ¢ = 3, whatis n? 
(t = —5, s = 4.5, Y = 9: n = 25) 


Injection of a certain type of hormone into hens is said to increase the mean weight of 
eggs by 0.3 ounces. A sample of 30 eggs has an arithmetic mean 0.4 ounces above 
the pre-injection mean and a value of § equal to 0.20. Is this enough reason to accept 
the statement that the mean increase is more than 0.3 ounces? 

(Since 1 = 2.74 > 1.699 = fy.995, We reject Hj:  S 0.3 in favour of 


H,: u> 03 ata = 0.05. ) 


A random sample of 25 hens from a normal population showed that the average laying 
is 272 eggs per year with a variance of 625 eggs. The company claimed that 
the average laying is at least 285 eggs per year. Test the claim of the company. at 
a = 0.05. eae 

(Since t = —2.6 < —1.711 = tyg.995, We reject Hy: m 2 285 in favour of 


H,: pb < 285). 
A producer of a certain make of flashlight ‘dry cell batteries claims that its output has a 


mean life of 750 minutes. A random sample of 15 such batteries has been tested and a 
sample mean of 745 minutes and a sample standard deviation of 24 minutes have been 





134 


(b) 


11. (a) 


(6) 


12. (a) 


(6) 


Statistics — Part Il 


obtained. Verify that these results are consistent with the null hypothesis 2 750 


against pw < 750 at & = 0.01. | 
(Since t = — 0.807 > — 2.624 = 'y4:o01» VE do not reject Hy: w 2 750 


against H,: p < 750.) 
Ten individuals are chosen at random from a normal population and the heights are 
found to be 63. 63, 66, 67, 67, 69, 70, 70, 71 and 7linches. In the light of these 


data, discuss the suggestion that the mean height in the population is 66 inches. 
( Since f.y97s5 = — 2-262 < ft = 1.78 < 2.262 = I9.9975, we do not reject 


Hy: = 66 against H,: W # 66 at a = 0.05. ) 


Ten cartons are taken at random from an automatic filling machine. The mean net weight 
of the 10 cartons is 15.90 oz and the sum of squared deviations from this mean is 
0.276 ( oz )%. Does the sample mean differ significantly from intended weight of 
16 oz.? 

( Since ty.oo25 = — 2-262 S t = — 1.806 < 2.262 = f9.997,, we do not reject 


Hp: # = 16 against H,: p # 16 at a = 0.05.) 


In the past a machine has produced washers having thickness of 0.050 inches. To 

determine whether the machine is in proper working order, a sample of 10 washers is 

chosen for which the mean thickness is 0.053 inches and the standard deviation is 0.003 

inches. Test the hypothesis that the machine is in proper working order using a level of 

significance of 0.05. 

(Since ¢ = 3.16 > 2.262 = f5.997,, we reject H,: pf = 0.050 in favour of 
H,:  # 0.050. ) 

A random sample of 16 values from a normal population showed a mean of 41.5 

inches and a sum of squares of deviations from this mean equal to 135 (inches)*. Show 

that the assumption of a mean of 43.5 inches for the population is not reasonable and 

that the 95% confidence limits for this mean are 39.9 and 43.1 inches. 

(Since t = —2.667 < —2.131 = tys5.99)5, We reject Hy: @ = 43.5 in favour of 
H,:  # 43.5 at o = 0.05. ) 

A random sample of nine from the men of a large city gave a mean height of 68 inches, 

and the unbiased estimate of the population variance from sample, 5? was 4.5 (inches)’. 

Are these data consistent with the assumption of a mean height of 68.5 inches for the 


men of the city.? , 
(Since fg.go05 = — 2306 < t = —0.708 < 2.306 = 5.9975, so we do not reject 


H,: H = 68.5 against H,: Lh # 68.5 at a = 0.05). 


4 





Hypothesis Testing = , : | 135 _. 


13.3. TEST OF HYPOTHESIS ABOUT 
A POPULATION PROPORTION, x 


Tests concerning proportions are based on frequency or count data, which are.outcomes 
of experiments such as the number of defective items in a production line, the number of errors 
made in typing a complex mathematical manuscript, and so forth. In this section we shall present 
the tests based on count data, where the test concerns the parameter 7m of the Bernoulli 
distribution for both small and large sample sizes. The statistic for testing hypothesis concerning 
proportions is the number of successes X or the proportion of successes P in n independently 
repeated Bernoulli trials. For small n, the tests require the use of binomial probabilities. For large 
n, the normal approximation to the binomial, using the Z statistic, is appropriate, the Z? statistic 
has y* -distribution. 


13.3.1 Forms of Hypothesis. Let 1, be the hypothesised value of the population proportion. 
The three possible null hypotheses and their corresponding alternative hypotheses, are: 


lL Hy: th 2 TN, against Ai: h < TK, 
2. Hy: & 3 % against H\: % > Ty 
3. Hy: % = Ty against H,: ™] # TN 


In each case, the test will be made by obtaining a simple random sample of size n and 
counting the number of successes X in the sample and computing the sample proportion P. 
Then P will be used in computing a test statistic. Depending upon the calculated value of the test 
Statistic, Hy will be accepted or rejected. 
13.3.2 Test based on Normal Approximation. We know that for a Bernoulli distribution with 
proportion of success 1, , the sampling distribution of the number of successes X in a sample of 
size n is a binomial distribution with parameters n and 1, . The sampling distribution of the 
sample proportion P = X/n is also a binomial distribution. The mean of the sampling 
distribution of P is m, and that m,(1—,)/n is the variance of the sampling distribution. 
Consequently, the test statistic is se 

P _ Ty 


which is approximately standard normal for large sample because of the normal approximation to 
the binomial. Or multiplying the numerator and denominator by n, we obtain the test statistic. 
X — NT, 


4| NTo (1-1) 


which is approximately standard normal when n is large and %, is not too close to zero or one. 


Strictly speaking, in computing Z we should use a continuity correction factor to transform the 
discrete binomial into a continuous normal distribution, that is 





—— 


136 Statistics — Part II 


We should use a plus sign (+) when X < nm %) or P < Mp and a minus sign ( —) when 
X>ntorP> Xp. 
The summary of the procedure of testing a hypothesis about the proportion of successes in a 


population, at a given significance level a, for the three respective alternative hypotheses is 
shown in the following table. 


Testing population proportion: 
Summary of procedure for 3 alternatives 


Choose alternative hypothesis 
H):% 2 Xp H):% ST Hy): = ly 
H,:% <%, H,:% > NM H,:% # Tp 


. Set level of significance: 


J %(1— %)/n 





Reject H, if 
Z < 22 OF 


Reject H, if 
Z<Z% 


Reject H, if 
Z>2_4 









Z > 21-a/2 


p-—N) 


J (1 —T,))/n 





We reject H, if 







We reject H, if 







27 2g Z< Zgyp OF 
otherwise we do f>%_g2 
Bon edecttlt otherwise we do 





treject H 
Example 13,15 The . pi 0 
by shipments of aanuseeiared ioe mene sales) of many businesses can be severely damaged 
items. A manufacturer of flashbulbs oe ! contain an unysually jgrge percentage of defective 
: ameras may want to be reasonable certain that less than 








Hypothesis Testing 137 


5% of its bulbs are defective. Suppose 300 bulbs are randomly selected from a very large 
shipment, each is tested, and 10 defective bulbs are found. Does this provide sufficient evidence | 
for the manufacture to conclude that the fraction defective in the entire shipment is less than 
0.05. Use a = 0.01. 


| 
| 
| 


Solution. We have n= 30, x = 10, p=— = se =10.033 
n 300 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the proportion of defectives m is less than 0.05. Consequently, we will test the null 
hypothesis that m 2 0.05 against the alternative hypothesis that x < 0.05. 


The elements of the one-sided left tail test of hypothesis are: 
Null hypothesis H,: m 2 0.05 


Alternative hypothesis H,: & < 0.05 
Level of significance: a = 0.01 


p- 
Test statistic: Z= eee follows an approximate standard 

¥ %(1-%)/n normal distribution under Hy 
Critical value: Za = 2%, = — 2-320 { From Table 10 (8) } 
Critical region: Z < —2.326 
Decision rule: Reject H, if Z < — 2.326, otherwise do not reject Ho. 

— 1% : =U. 

Observed value: z = pet eo eee = —1.351 

 T (1 —T,)/n J 0.05(1 — 0.05)/ 300 
Conclusion: Since z = - 1.351 > — 2.326, we do not reject H,. The 


manufacturer cannot conclude with 99% confidence that the upent 
contains fewer than 5% defective bulbs. 


Example 13.16 It is known that approximately 10% smokers prefer cigarette brand A. Apes a 
promotional campaign in a given sales region, a sample of 200 cigarette smokers were 
interviewed to determine the effectiveness of the campaign. The results of the survey showed that 
26 people expressed a preference of brand A. Do these data present a sufficient evidence to 
indicate an increase in the acceptance of brand A in the region. Use @ = 0.05. 


Solution. We have n = 200, x = 26, p = — ONS 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the proportion of smokers m is greater than 0.10. Consequently, we will test the null 
hypothesis that x < 0.10 against the alternative hypothesis that x > 0.10. ; 


The elements of the one-sided right tail test of hypothesis are: 

Null hypothesis H,: % s 0.10 

Alternative hypothesis H,: ™ > 0,10 

Level of significance: a = 0.05 => 1-a = 0,95 


138 Statistics — Part II 


Test statistic: Z = ——- follows an approximate standard 
VM (1—%q)/n normal distribution under H, 
Critical value: Zea = %o5 = 1-645. { From Table 10 (5) } 
Critical region: Z > 1.645 
Decision rule: Reject H, if Z > 1.645, otherwise do not reject Hp. 
Observed value: z= eee Eel hag 
J Tt) (1-1, )/n / 0.10(1 — 0.10)/ 200 
Conclusion: Since z = 1.414 < 1.645, so we do not reject H,. We cannot 
conclude with 95% confidence that the promotional campaign is 
effective. 


Example 13.17 A supplier of components to a motor industry makes a particular product which 
sometimes fails immediately it is used. He controls his manufacturing process so that the 
proportion of faulty products is supposed to be only 4%. Out of 500 supplied in one batch 28 


prove to be faulty. Has the process gone out of control to produce too many faulty products? Test 
as & = 0.05 applying continuity correction. . 


Solution. We have n = 500, x = 28 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the proportion of faulty products 7 is greater than 0.04. Consequently, we will test the null 
hypothesis that m < 0.04 against the alternative hypothesis that m > 0.04. 


The elements of the one-sided right tail test of hypothesis are: 
Null hypothesis H,: m < 0.04 


Alternative hypothesis H,: m > 0.04 


Level of significance: a = 0.05 => l1-—a = 0.95 
(X +05) -— nt, 


Test statistic: Z = follows an approximate standard 
Fy NT, (1 — Ny ) normal distribution under H, 

Critical value: Za = %o5 = 1.645 { From Table 10 (5) } 

Critical region; Z > 1.645 

Decision rule: Reject Hy if Z > 1.645, otherwise do not reject H,. 


(x£0.5) — nt, (28 — 0.5) — 500(0.04) 
Observed value: = 1/1 
J nto(1 - To) 5000.04) (1 — 0.04) 


Conclusion: Since z = 1.71 > 1.645, we reject H, and conclude that the 
process is out of control. 

Example 13.18 The records of a certain hospital showed the birth of 723 males and 617 

_ females in a certain week. Do these figures conform to the hypothesis that the sexes are born in 

equal proportions? Usea =0.02. 





Hypothesis Testing 3 139 


Solution. We have x = 723. n—x = 617, n = 1340 
yee J3 . 054 
n 1340 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the proportion of male births 2 is not equal to 0.5. Consequently, we will test the null 
hypothesis that = 0.5 against the alternative hypothesis that x + 0.5. 


The elements of the two-sided test of hypothesis are: 
Null hypothesis Hy: = 0.5 


Alternative hypothesis H,:t% # 0.5 


Level of significance: a=002 => oa/2=001 => 1-a/2 = 0.99 


Test statistic: 2 ee follows an approximate standard 

VM (1—%)/n normal distribution under H, 
Critical values: Za/2 = %o, = — 2.326, 

Z1-a/2 = % 99 = 2.326 { From Table 10 (5) } 

Critical region: Z < -—2.326 or Z > 2.326 
Decision rule; Reject Hy if Z < —2.3260r Z > 2.326, otherwise do not reject H, 
Observed value: z= a gem Use = 2.928 . 

J Tt) (1-—n,)/n J 0.5(1 — 0.5)/ 1340 
Conclusion: Since z = 2.928 > 2.326, we reject H, and conclude that the 


proportion of male births is not equal to 0.5. 


Exercise 13.3 
1. (@) For each of the following sets of data, carry out a significance test for the hypotheses 
stated. f 
Number in Number of Hypotheses 





see eer oe ee 


{ (i) Since z = 1.768 >.1.645 = Zo.95» WE reject Hy: m™ S 0.8 in favour of 
H,: m > 0.8. | 


(ii) Since z = 2.335 > 2.326 = Zo, we reject Hj: x = 0.55 in favour of 
H,: m # 0.55. 





140 


(b) 


2. (a) 


(6) 


(c) 


3. (a) 


(6) 


Statistics — Part Il 


(iti) Since 25935 = — 196 < z = 1.897 < 1.96 = z%.7,, we do not reject 
Hy: m = 1/4 against H,: x = 1/4. 
(iv) Since Zgo95 = — 2.576 < z = 2.179 < 2.576 = Zo9,, we do not reject 


Hy: ™ = 0.65 against H,: x # 0.65. 
(vy) Since z = — 3.060 < —2.326 = z ,, we reject H,: m >= 0.76 in favour 
of H,: & < 0.76. } 


A basket ball player has hit on 60% of his shots from the floor. If on the next. 100 shots 

he makes 70 baskets, would you say that his shooting has improved. ( Use a 0.05 level 

of significance ). 

(Since z = 2.041 > 1645 = Zz... we reject H,: m < 0.60 in favour of 
H,: m > 0.60; Yes ) 


In a poll of 10000 voters sglected at a random from all the voters in a certain district, it 
is found that 5180 voters are in favour of a particular candidate. Test the null 
hypotheses that the proportion of all the voters in the district, who favour the candidate 
is equal to or less than 50% against the alternative that it is greater than 50%. Use a 
0.05 level of significance. 


(Since z = 3.6 > 1.645 = Z9,, wereject Hy: m < 0.5 in favour of H,: m > 0.5) 


A retailer places an order for 400 recapped automobile tires with a supplier who claims 

that no more than 5% of his output is ever returned unsatisfactory. In time, 31 of the 

400 tires are unsatisfactory. Should the retailer continue to trust his supplier's word as to 

the rate of returns? Use 5% level of significance. 

(Since z = 2.524 > 1.645 = z.,, we reject H,: m < 0.05 in favour of 
H,: tm > 0.05; No) 


A commonly prescribed drug on a market for relieving nervous tension is believed to be 
only 60% effective. Experimental results with a new drug administered to a random 
sample of 100 adults who were suffering from nervous tension showed that 70 got 
relief. Is-this sufficient evidence that the new drug is superior to the one commonly 
prescribed? Use 5% level of significance. 

(Since z = 2.041 > 1.645 = z.., we reject H,: m < 0.60 in favour of 

H,: m > 0.60; Yes ) 
An electrical company claimed that at least 95% of the parts which they supplied on a 


government contract confirmed to specification. A sample of 400 parts was tested, and 
355 meet specification. Can we accept the company's claim at a 0.05 level of 


_ significance? 


(Since z = —5.735 < —1.645 = Zo;, we reject H,: m 2 0.95 in favour of 
H,: % < 0.95; No) 

An electric company claimed that at least 85% of the parts which they supplied 

conformed to specifications. A sample of 400 parts was tested and 75 did not meet 

specifications. Can we accept the company's claim at 0.05 level of significance? 

(Since’z = —2.100 < — 1.645 = Zp,, we reject Hy: m 2 0.85 in favour of 
H,: % < 0.85; No) 


Hypothesis Testing | | , 141 


(c) 


4. (a) 


(6) 


5. (a) 


(0) 


6. (a) 


(6) 


The manufacturer of a patent medicine claimed that it was at least 90% effective in 
relieving an allergy for a period of 8 hours. In a sample of 200 people, who had the 
allergy, the medicine provided relief for 160 people. Determine whether the 
manufacturer's claim is legitimate. 


(Since z = —4.714 < —1.645 = Z 5, we reject Hy: m 2 0.90 in favour of 
H,: t < 0.90; No) 

A coin is tossed 100 times and 38 heads are obtained. Is there evidence, at the 2% 

level that the coin is biased in favour of tails? 

(Since z = -24 < —-2.05 = Zp), we reject Hj: m = O.5 in favour of 
He = OSV es) an 

A coin is tossed 400 times and it tums up heads 216 times. Discuss whether the coin 

may be an unbiased one. 


(Since z,,, = — 1.960 < z = 1.6 < 1.960 = Zoo7s» WE do not reject H,: m = 0.5 
against H,: nm # 0.5) 

In arandom sample of 1000 houses in a certain city, 618 own calor TV sets. Is this 
sufficient evidence to conclude that 2/3 of the houses in this city have colour TV sets? 
Use a = 0.02. 

(Since z = — 3.288 < —2.326 = Zo), we reject Hj: m = 2/3 in favour of 
H,: ] # 2/3; No) 

The sex distribution of 98 births reported ina newspaper was 52 boys and 46 girls. Is 


this consistent with an equal sex division in the population? Use 5% level of 
significance. 


(Since Z%.5 = — 1.960 < z = 0.594 < 1.960 = Z.,,, we do not reject 
H,: 7 = 0.5 against H,:m% # 0.5.) 

The reputations (and hence sales) of many businesses can be severely damaged by 

shipments of manufactured items that contain an unusually large percentage of defective 

items. A manufacturer of flashbulbs of cameras may want to be reasonable certain that 


less than 5% of its bulbs are defective. Suppose 300 bulbs are randomly selected from’ 


a very large shipment, each is tested, and 290 good bulbs are found. Does this provide 
suficient evidence for the manufacturer’.to conclude that the fraction defective in the 


‘entire shipment is less than 0.05? Use a = 0.01. 


(Since z = — 1.351 > —2.326 = Z9,, we do not reject H,. The manufacturer cannot 
conclude with 99% confidence that the shipment contains fewer than 5% defective 
bulbs ) 

A supplier of components to a motor industry makes a semiigiin product which 
sometimes fails immediately it is used. He controls his manufacturing process so that the 
proportion of faulty products is supposed to be only 4%. Out of 500 supplied in one 
batch 28 prove to be faulty. Has the process gone out of control to nroduce too many 
faculty products? Test at a = 0.05 applying continuity correction. 
(Since z = I.71 > 1.645 = 2)4,, we reject Hj: m < 0.04 in favour of 


© 


_H,:™ > 0.04 and conclude that the process is out of control ) 





142 | Statistics — Part I 


13.4 ~. TEST OF HYPOTHESES ABOUT THE DIFFERENCE BETWEEN 
TWO POPULATION MEANS — INDEPENDENT SAMPLES 


There are many problems where we are interested in hypotheses concerning about the 
differences between the means of two populations. For instance, we may wish to decide upon the 
basis of suitable samples whether a new fertilizer is more effective than an existing fertilizer, or 
whether a newly introduced product is more reliable than an existing product. Specifically, within 
the frame work of statistical language, we are interested in making inferences about the parameter 
H, -— H,. A test of hypothesis must be based on assumptions regarding the structure of the 
underlying distributions. Two independent random samples must be taken — one from each of 
the two populations of interest. 

13.4.1 Forms of Hypothesis. We are interested in tests about the parameter 1, — Hl, . Let 6, 


be a hypothesized value of the difference between two population means the three possible null 
hypotheses about the difference between two population méans, and their corresponding 
alternative hypotheses, are: 


1. Ayo: BH, — By 2 4, against Hy: WH, - Hy < 55 


2 Hy: H, — 2 S 5) against Hy: HW, — 2 > 5, 

13.4.2 Independent Samples: Normal populations, known variances, any sample sizes. 
Suppose that we have two independent random samples of sizes n, and n, from two normal 
populations having, respectively, means ff, and pl. and known variances o? and o%. We 
known that X, is distributed as N(u,, o?/n, ) and that X, is distributed as N(u,, 02/n, ), 
and that X, is independent of X,. The statistic X, — X,, the difference between two 
independent normal variables, is also normally distributed with mean #, — pf, and variance 
a? /n, + o3/n, . Consequently the random variable : 


Z = (xX, = Xs) = (HL, —H,) 
2 2 
wi yee) Ei 
my it 
has a standard normal distribution. Thus the appropriate test statistic is 


7 = (X, - X,) — 9 





13.43 Independent Samples: Normal populations, same unknown variance, small samples. 
Suppose that we have two independent random samples of sizes n, and n, (n, < 30 and 
n, < 30) from two normal populations having, respectively, means 4, and 2, and unknown 


Hypothesis Testing : a | 143 


common variance 07 (i. e., o? = 03 = a”). Then X, is normally distributed with mean p, 
and variance o?/n, and X, is normally distributed with mean 1, and variance o7/n, and 
that Xx, is independent of x; 


Therefore, X, — X, is normally distributed with mean 





and variance 
2 2 ) 
‘e] 
of ash fo = Caen = o? i arial le 
esha mM wl MH 2 
Thus the random variable 
7 (X, — X35) — (ey — Hz) 
] l 
oo. |{— + — 
ur Ay 


has a standard normal distribution. Since we assume that the two populations have equal 
unknown variances (a? = o2 = o* unknown ), we will replace the population variance by the 
sample variance. 

The difficulty in making this replacement lies in the fact that we have two estimates of o7, Ss? 
and S3 , since two different samples were collected. Even if of = 05 = O°, it is unlikely that 
the two samples collected will have exactly the same value because of sampling error. But if $2 
and $2 differ which of these two estimates should be used to estimate the unknown population 
variance o?. ! 

Since we wish to obtain the best estimate available, it would seem reasonable to use an estimator 
that would pool the information from both samples. Thus, if Se and Ss? are the two sample 


variances (both estimating the variance o* common to both populations), the pooled (weighted 
arithmetic mean) estimator of a7, denoted by 5°, is 


<A (n, —1)S? + (ny -1)$3 
P n +n, — 2 
X(X, =X)? + UX = Xo)? 
n +n, — 2 
(2X2 —n X?) + (ZXP —n, XP) 








——— 


144 Statistics — Part Il 





To obtain the small-samples test statistic for testing Hp: “, — uw, = 45,, substitute the pooled 
estimator of o2 into the above formula to obtain 
(X ivr xX ? ) a do 

I ] 


a 
Pp 
ny ny 


T= 


which has a ?-distribution with v = n, + n, — 2 degrees of freedom. 


13.4.4 Independent Samples: Any populations, large samples. When both sample sizes are 
large ( say greater than 30) the assumptions regarding small samples can be greatly relaxed. It is 
no longer necessary to assume that the parent distributions are normal, because the Central Limit 
Theorem assures that X, is approximately normally distributed with mean Ht, and variance 


o{/n,, and that X, is also approximately normally distributed with mean j1, and variance 
3/7, , and that X, is independent of X,, then X, — X, is approximately normally distributed 
with mean 1, — Ht, and variance o?/n, + o2/n,. 
Thus the random variable 
7 = (X1-X2) - (hy - by) 
a CK 


_—_—_— + — 


at Ly 
has an approximate standard normal distribution. 
Because n, and n, are both large, the approximation remains valid if o? and o2 are replaced 
by their sample variances Ss and Se The assumption of equal variance-is not required in 
inferences derived from large samples. We can still use Z test with S/ substituted for o? and 
22 : } 
5, substituted for 62 so long as both samples are large enough for the Central Limit Theorem 





to be applied. 

That is, we use 
2 a2 
IST S2e 
ny Lp 

_as the test statistic. 


The summary of the procedure of testing a hypothesis about the difference between means of two 
populations whose variances 07, 07 are known/unknown, at a given significance level a, for 
the three respective alternative hypotheses is shown in the following table. 


Hypothesis Testing | ee 145 


Testing difference between means of two populations, Cae ‘known/unknown: 


Summary of procedure for 3 alternatives 


Choose alternative hypothesis 





Ho: Hy Hz S 9% 
Hy: Hy —Hy > 99 


Hy: fy —H, 2 5, . 
Hy: M, —H, < 5, 






Set level of significance: 








Reject Hy if Reject H, if 


Z < 22 OF 


Reject Hy if 





AEA so Lie Ze 


2> Z_g/2 








We reject Hy if We reject H, if We reject H, if 










otherwise we do otherwise we do Z>%_qo 
not reject H, not reject Hp oe 
otherwise we do 
not reject H, 


The summary of the procedure of testing a hypothesis about the difference between means of two 
normal populations with same unknown variance 067 = 07 = oO”, ata given significance level 
a, for the three respective alternative hypotheses is shown in the following table. 





i465 _ Statistics — Part Il 


Testing difference between means of two normal populations, same unknown variance: 
Summary of procedure for 3 alternatives 


Choose alternative hypothesis 


Hy: Hy —Hy S 5 





Set level of significance: o 


X(X, z= x, > + U(X. - xo): 
nm +n, — 2 


Reject H, if Reject Hy if 


[> t.. 


vil-a@ T< lys0/2 or 


T > ty.1-a/2 









L(x — %)* + ZOj2 - %)? 





We reject Hy if _ We reject Hy if 


ia: t< bys OF 


t > by -0/2 


otherwise we do 
not reject H, 








Hypothesis Testing 147 7 


Example 13.19 Apex's current packaging machinery for coffee is known to ground coffee into 3 
“l-pound cans” with a standard deviation of 0.6 ounce. Apex is considering using a new — 
packaging machine which is expected to pour coffee into “\1-pound cans” more accurately, with a 

standard deviation of 0.3 ounce. Before deciding to invest in the new machine, Apex wished to le 
test its performance against the old machine. A sample was taken on each machine to measure 5 


the mean weight of contents of the “1-pound can”, with the following results. 


Using Old Machine 
Using New Machine 
Test, at the 5% level of significance, the hypothesis that there is no difference in the average 


weight of the contents poured by the old machine versus the new machine. 
Solution. We have n, = 25, xX, = 16.7, a, = 9.6 


iy = 36s 1 Xie 8 oe =n09 






16.7 ounces 





15.8 ounces 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the average weight of the contents poured by the old machine p1, differs from that of the 


new machine p,. Consequently, we will test the null hypothesis that W, = H, against the 
alternative hypothesis that UW, # H,- 

The elements of the one-sided right tail test of hypothesis are: 

The elements of the two-sided test of hypothesis are: 

Null hypothesis Hj: HW, = Hp > Hy, - #2 = OChe. dy = 0) 

Alternative hypothesis  H,: HW, # Hp = HM, — Hz # OCie. dy # 0) 

Level of significance: a = 0.05 = a/2 = 0.025 = 1-a/2 = 0.975 





Test statistic: Z= > = follows a standard normal 

Olea 76 distribution under H, 

ny, nN» 
Critical values: 25/2 = 29925 = - 1.960, 
<1 -a/2 = 20975 = 1.960 { From Table 10 (b) } 
Critical region: Z < —1.960 or Z > 1.960 | 
Decision rule: Reject H, if Z < — 1.960 or Z > 1.960, otherwise do not reject Hy. 
Observed value: z= (% = %).= 90° = _ (16.7 15.8) — 0 _ = 6.92 
: o2 oa? (0.6)? , (0.3)? 

aoa Ea ee ee Sn 

ny No 25 36 
Conclusion: Since z = 6.92 > 1.960, we reject H, and conclude that the mean 


weight of a “l-pound can” filled by the new machine is not the same 
as that of filled by the old machine. | 


Example 13.20 The test was given to a group of 100 scouts and to a group of 144 guides. 
The mean score for the scouts was 27.53 and the mean score for the guides was 26.81. 


148 Statistics — Part I 


Assuming a common population standard deviation of 3.48, test, using a S% _ level of 
significance, whether the scouts performance in the test was better than that of the guides. 


Solution. We have n, = 100, X,; = 27.53 
n, = 144, Xx, = 2681 
Common population standard deviation: o = 3.48 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the mean score for the scouts 1, is greater than that of the guides 1, . Consequently, we 


will test the null hypothesis that u, < , against the alternative hypothesis that uw, > LU. 
The elements of the one-sided right tail test of hypothesis are: 

Null hypothesis Hy: fl; S Hp => MH, -H, S O(a dy Ss 0) 
Alternative hypothesis Hi: , > Us => UH, - Hn > OCi.e. 6, > 0) 

Level of significance: a = 0.05 = l1-a = 0.95 


X,-X,)-6 
Test statistic: Z= 20: follows a standard normal 
o as + 2m distribution under H, 
ny ms 
Critical value: Bu Sag eS UME { From Table 10 (b) } 
Critical region: | Z > 1.645 
Decision rule: Reject Hy if Z > 1.645, otherwise do not reject Hp. 
x, -x,).- 6 53 — 26.81) - 
Observed value: z= Ops 92): 90 = (27.53 = 26.81) — 0 = 1.589 


onl ere egg) bh 
ny N» 100 144 


Conclusion: Since z = 1.589 < 1.645, we do not reject H, and conclude that 


there is not sufficient evidence, at 5% level of significance, to show 

that the performance of the scouts in the test was better than that of the 

guides. 
Example 13.21 The management of a restaurant wants to determine whether a new advertising 
campaign has increased its mean daily income ( gross ). The income for 50 business days prior 
to the campaign’s beginning were recorded. After conducting the advertising campaign and 
allowing a 20 days period for the advertising to take effect, the restaurant management recorded 
. the income for 30 business days. These two samples will allow the management to make an 
inference about the effect of the advertising campaign on the restaurant's daily income. A 
summary of the results of the two samples are shown below: 


' Size Mean Standard deviation 
Before campaign | n, = 50 xX, = 1255 S, = 215 
After campaign’ | n, = 30 X, = 1330 5, = 238 


Do these samples provide sufficient evidence for the management to conclude that the mean 
income has been increased by the advertising campaign test using a = 0.05? soit 
(Ve ; Nise 


























Hypothesis Testing » | 7 149 


Solution. The objective of the sampling is to attempt. to support the research ( alternative ) 
hypothesis that the mean daily income after advertising campaign f£, is greater than that of 


before the campaign 1, . Consequently, we will test the null hypothesis that UW, S HM, against the 
alternative hypothesis that U, > ,. 

The elements of the one-sided right tail test of hypothesis are 

Null hypothesis Hy: Hy S Hy = He — by S Ot 65 S 0) 

Alternative hypothesis Hi: ol, > By => Ha - My > OCLe. dy > 0) 

Level of significance: a = 0:05 => l1—a = 0.95 

(Xs = x ) = do | | 
4 follows a standard normal 
distribution under H) 


Test statistic: ‘££: 





Critical value: { From Table 10 (5) } 





Critical region: Z > 1.645 
Decision rule: Reject H, if Z > 1.645, otherwise do not reject Hp 
X,-x,) -—4 1330 — 1255) — 0 
Observed value: zZ = Ag Hy te Oot = ae = 1.414 
(215)? in (238)? 
nh ny 50 30 
Conclusion: Since z = 1.414 < 1.645, we do not reject H,. That is, the samples 


do not provide sufficient evidence, at the a = 0.05 significance level, 

for the restaurant management to conclude that the eae 

campaign has increased the mean daily income. 
Example 13.22 A feeding test is conducted on a herd of 25 milking cows to compare two Naiets! 
one of dewatered alfalfa and the other of field-wilted alfalfa. Dewatered alfalfa has an economic 
advantage in that its mechanical processing produces a liquid protein-rich by product that can be 
used to supplement the feed of other animals. A sample of 12 cows randomly selected from the 
herd are fed dewatered alfalfa; the remaining 13 cows are fed field-wilted alfalfa. From 
observations made over a three-week period, ‘the average daily milk production in pounds 
recorded for each cow ts: 


Field-wilted alfalfa + 
Dewatered alfalfa | - 





Do the data strongly indicate that the milk yield is less with dewatered alfalfa than with field- 


wilted alfalfa? Test at & = 0.05. 
Solution. The means of two samples and estimate of common variance are . 





135 47 55 2940 3932 41 42 57 51 39 


Dexa 27273 
Lx? = 22261 








150 : Statistics — Part Il 






Sh AE Ca 
ny 
pee ee 207 _ 4095 
Ns 12 
; I 2 2 
is Sky age ACRES S272 731s ee, 767.69 
ny 13 
“ = 4 } ( a | 2 . : =.= 
D(x; —X3) = 2x5 _ £22)" = 22261 - MSO = §40,25 
7 n, 12 
ahh 2(4, = %))F + 20, -%))?  _— |: 767.69 + 840.25 _ 8 36 





n, +n, -2 134+12-2 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the mean milk yield with dewatered alfalfa 1, is less than with field-wilted alfalfa p,. 
Consequently, we will test the null hypothesis that 4. 2 s, against the alternative hypothesis 
that Ww. < L,. 
The elements of the one-sided left tail test of hypothesis are: 
Null hypothesis Hy: H, 2 HW; = U,-U, 2 O (ie 6, 2 0) 
Alternative hypothesis Aji, < My =U, -H, < O(ie 5 < 0) 
Level of significance. oO = 0.05 

40%, = (X, — X,) = & : “eo 
Test statistic: T = = follows a t-distribution under 


] 2 


Degrees of freedom: Vv ny +n,-2 = 13+12-2 = 23 





Critical value: beg = t3.005 = —1.714 (From Table 12) 
Critical region: . T <-1,714 
-Decision rule: Reject Hy if T < — 1.714, otherwise do not reject H, 
= = = A é : iets : = 
Observed value: f= (= 4) = 90 = (42.25 — 49.15) — 0 = —(0.87 
Se alts + at 8.36 zits + aes 
EN in ete no 1312 
Conclusion: | Since + = — 0.87 > — 1.714, we do not reject Hy: , 2 Ly 


against H,: UM, < M,- 


Example 13.23 Two random samples taken independently from normal populations with an 
identical variance yield the following results. 





Hypothesis Testing : , : 151 








Ae the hypotheses that the true difference between the population means is at most 10, that is 
— pL, < 10, against the alternative that UU, — HL, > 10 at 5% level of He ts 


oat The estimate of common variance 1s 






(n, -1)8? + (mn, -D) 5; 
(12 — 1)1200 + (18 — 1)900 
12 +18 -— 2 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the mean of second population #1, is greater than that of first population 1, by more than 


= 319 


10 points. Consequently, we will test the null hypothesis that HW, — H, = 10 against the | 


alternative hypothesis that U, — H, > 10. 

The elements of the one-sided right tail test of hypothesis are: 

Null hypothesis Hy: H,' - HU, S 10 (i.e. 6g S 10) 
Alternative hypothesis H,: Hl, — H, > 10 (i.e. dg > 10) 
Level of significance: a = 0.05 => 1-a@ = 0.95 


| (X, —X,) - 5g ee ‘ 

Test statistic: i= aL <i follows a t-distribution under 

dfs — H, with 

ny N» 

Degrees of freedom: ven tn,—-2 = 12+ 18-2 = 28 
Critical values: tytog = f2g:095 = 1-701 (From Table 12) 
Critical region: T > 1.701 ) 
Decision rule: Reject H, if T > 1.701, otherwise do not reject Ho. 
Observed value: t= sai Ma) S208 a). 2 = 3510) =e —. Bi = 0.421 

S = + — 31.9 

PV n, 
Conclusion: Since t = 0.421 < vent = ey we do not reject H,. 





a | ae Exercise 13.4 a2 : 

1. (a) For each of the following sets of data, perform a test to decide whether there is a 
significant difference between.the means, H, and [,, of the normal Serpe from 
which the samples are drawn. 











Bs 


oo 





152 


(i) 


(vii) 


(vill) 





{ @ 
(i) 
(ii 
oe 


(vy) 


a: 


Tafa] af a ee 
| 4250 30 


Statistics — Part Il 
5% 


2% — 


-| Common Population 
standard deviation o | 


10% 


Since z = —2.096 < - 1.96 = Z,, wereject Hy:, = [, in favour of 
Ay: Hy; # Hp. 

Since a 1.402 > — 2.054 = z,,, we do not reject H): MW, 2 Up 
against H,: [L, < HL. 

Since z = 2.493 > 2.326 = Zo), we reject Hy:{J, S HM, in favour of 
H,:H, > H,- . 

Since z = 2.076 > 1.645 = z),,, we reject Hj: HW, = H, in favour of 
Hy: My > > - 


Since z = —2.036 > —2.326 = z ,, we donot reject Ho: H, 2 MH, against 


Hy: Hy < H,. 
‘Since z = 1.783 > 1.645 = z)9., we reject Hy: H, = HH in favour of 
Hy: 7H, * Hp. 


Hypothesis Testing 153 


(5) 


2. (a) 


(5) 


3. (a) 


(6) 


(vii) Since z = 1.752 > 1.645 = Zo, wereject Hy: M,; S Hz in favour of 
Hy: My > H2- | | 

(viii) Since zy, = — 2.326 < z = —2.351 < 2.326 = 299, we do not reject 
Hy: MW, = M, against HW}: MW, * H- 


(ix) Since z = 2.453 > 2.326 = zo, Wwereject Hy: , — Hz S 20 in favour 
of H,: HW, — H, > 20.) 7 


A simple sample of heights of 6400 Englishmen has a mean of 67.85 inches and a 
standard deviation of 2.56 inches, while a simple sample of heights of 6400 Australian 
has a mean of 68.55 inches and a standard deviation of 2.52 inches. Do the data indicate 
that Australians are on the average taller than the Englishmen? 

{ Since z = 15.589 > 1.645 = Zgo5, we reject Hy: HM, = Hy against H,: Uy > Hy 
at a = 0.05. } 

A random sample of 100 professors in private colleges showed an average monthly 
salary of Rs. 5000 with a standard deviation of Rs. 200. Another random sample of 
150 professors in Govt. Colleges showed an average monthly salary of Rs. 5600 witha 
standard deviation of Rs. 250. Test the hypotheses that the average salary for the 
professors teaching in Govt. Colleges does not exceed the average salary for PEE 
teaching in private colleges by more than Rs. 500. Use a = 0.01. 

{ Since z = 3.499 > 2.326 = Zo, we reject Hy: M, — , S 500 in favour of 


H,: > - fl, > 500. } 


A random sample of 80 light bulbs manufactured by company A had an average 
lifetime of 1258 «hours with a standard deviation of 94 hours, while a random sample 
of 60 light bulbs manufactured by company B had an average lifetime of 1029 hours 
with a standard deviation of 68 hours. Because of the high cost of bulbs from company 
A, we are inclined to buy from company B unless the bulbs from company A will last 
over 200 hours longer on the average than those from company B. Run a test using 
= 0.01 to determine from whom we should buy our bulbs. 
{ Since z = 2.118 < 2.326 = Zy99, we do not reject Hy: H, — Mz S 200 against 


H,: HM, — H, > 200. } 


A farmer claims the average yield of corn of variety A exceeds the average yield of 
variety B by at least 12 bushels per acre. To test this claim, 50 acres of each variety 
are planted and grown under similar conditions. Variety A yields on the.average 86.7 
bushels per acre with a standard deviation of 6.28 bushels per acre, while Variety B 
yields on the average 77.8 bushels per acre with a standard deviation of 5.61 bushels 
per acre. Test the farmer's claim using a 0.05 leyel of significance. 

{ Since z = —2.603 < —1.645 = Zoos, we reject Hy: WH; — HM, = 12 in favour of 


Hy: My — Hy < < 12.) sad 
An examination was taken to two classes of 40 and 50 students respectively, In the 


first class, mean grade was 74 with a standard deviation of 8, while in the second class 
the mean grade was 78 with a standard deviation of 7. Is there a significance 








154 


4. (a) 


(6) 


(c) 


5. (a) 


(6) 





_ Statistics — Part U 


difference between the mean grades at 1% level. 

{ Since zoo; = — 2.576 < z = 2.490 < 2.576 = Zo95, we do not reject 
Hy: MH, = H, against H,;: HW, *~ wy.) 

A random sample of 100 light bulbs manufactured by company A had a mean lifetime 

of 1230 hours with a standard deviation of 80 hours, while a random sample of 121 

light bulbs manufactured by company B had a mean lifetime of 1200 hours with a 

standard deviation of 66 hours. Are the mean lifetimes of bulbs manufactured by two 

companies are significantly different? : 

{ Since z = 3 > 1.960 = 2975, we reject Hy: H, = HH, in favour of 
Ay: pw, # p,.-) 

A random sample of 200 villages was taken from Faisalabad District and average 

population per village was found to be 498 with a standard deviation of 50. Another 

sample of 200 villages from the same district gave an average population 510 per 

village with a standard deviation of 40. Is the difference between the average of the two 

samples statistically significant ? 

{ Since z = — 2.650 < — 1.960 = Zp ),, we reject Hy: , =~ H, in favour of 
Ai: HW, * 2.) 

The means of simple samples of 500 and 400 are 11.5 and 10.9 respectively. Can 

the samples be regarded as drawn from a population of standard deviation 5 ? 

{ Since Zo, = — 1.960 < z = 1.789 < 1.960 = 2) ),, we do not reject 
Hy: H, = H, against H): uw, * p,.-} 

Describe the procedure for testing the equality of means of two normal populations for: 
(i) Large samples, 
(ij) Small samples 

For each of the following-sets of data, perform a test to decide whether there is a 

significant difference between the means, 1, and j,, of the normal populations from 

which the samples are drawn. | 


Pies [Bea [s | En [EF] tpt 
|) te |) oe 83 112 7S 


H 


(c) 


6. (a) 


(0) 


7. (a) 


(6) 


'ypothesis Testing | 155 


( (i) Since tf = 2.135 > 1.796 = thy.995, we reject Hoi H, S HM, in favour of 
H,: Hy > B2- | 
(ii) Since ty.9925 = — 2-228 < ft = — 0.567 < 2.228 = tip.9975, We do not 
reject Ho: MW, = HM, against Hj: UW, * Hp. 
(ii?) Since t = 2.088 < 2.583 = t\¢.9.99» We do not reject Hy: MW, - H, S 4 
against H,: Hf, — Hy > 4. 
(iv) Since tyg.g95 = — 1.701 < ¢ = 1.260 < 1.701 = 55.995, we do not reject 
Hy: #, = MH, against Hj: HW) = HL, -} 
Eight pots, growing three barley plants each, were exposed to a high tension discharge 
while nine similar pots were enclosed in an earthen wire cage, the number of tillers 
( shoots ) in each pot were as follows: ee 3 
Caged _ 17a 825 me29 M2723 1728 
Electrified 16 16 20 16 Zines 15 20 
Discuss whether the electrification exercises any real effect on tillering. 
{ Since t = 3.058 > 1.753 = tys5.995> we reject Hy: H, S , in favour of 


H,: Hy > HM, at & = 0.05. } 


A random sample of 6 cows of breed A had daily milk yields in lb., as 16, 15, 18, 
17, 19 and 17 and another random sample of 8 cows of breed B had daily milk 
yields in lb., as 18, 22, 21, 23, 19, 20, 24 and 21. Test if breed B is better than 


breed A at a = 0.05, 
{ Since t = 4.162 > 1.782 = thp.995, we reject Ho: #2, S HH, in favour of 


Hi: My > H,:} 
The heights of six randomly selected sailors are in inches: 62, 64, 67, 68, 70 and 71. 
Those of ten randomly selected soldiers are 62, 63, 65, 66, 69, 69, 70, 71, 72 and 
73. Discuss in the light of these data that soldiers are on the average taller than sailors. 
Assume that the heights are normally distributed. 
{ Since t = 0.526 < 1.761 =. t4.9.95, We do not reject Ho: HW. S HM, against 
Ay: HW, > by at & = 0.05. } | | 
The weights in grams of 10 male and 10 female juvenile ring-necked pheasants are: 
Males 1293 1380 1614 1497 1340 1643 1466 1627 _1383 1711 
Females | 1061 1065 1092 1017 __1021 _1138 1143 1094 1270 1028 


Test the hypothesis of a difference of 350 grams between population means in favour 
of males against the alternative of a greater difference. 


{ Since t = 1.007 < 1.734 = t1g.0,95+ We do not reject Hj: HW, — HW, S 350 against 
Hi: MH, — Hz > 350 at a = 0.05.} 
The following data are the gains in weight, measured in pounds, of babies from birth to 


age six months. All babies in both groups weighed approximately the same at birth. The 
babies in sample I were fed formula A, and babies in sample II were fed formula B. 








156 


_ Statistics — Part I 





8. (a) 


(6) 


T9.\(a) 


“() 





(Assume that the experimenter has no preconceived notions about which formula might 
be better). 


Sample I Pinon ce 7. 8S 


Sa] Oa a 


Test at the 5% level of significance ‘that the mean of population I equals mean of 
population II. 


{ Since ty4.o925 = — 2-145 < ¢ = — 1.083 < 2.145 = ty4,9975, we do not reject 
Hy:H, = HM, against H,: HW, # H,-} 


From the area planted in one variety of value, 54 plants were selected at random. Of 
these plants, 15 were “Off-types” and 12 were “Abberrant”. The rubber percentages 


for these plants were: 


Offtypes | 621, 5.70, 6.04, 4.47, 5.22, 4.45, 4.84, 5.88 
5.82, 6.09, 5.59, 6.06, 5.59, 6.74, _5.55. 


Prceraniceell nas mee 7 en 6 48 ue 7271 737.2 -7.20,- "7.06, - 6.40, 


$93, ol; 251, 6.36. 


Test the Rypothesis of no difference Berveed the two means. Also compute a 95% 


confidence interval for the difference of two population means. 
{ Since ¢ = — 3.120 < —2.060 = f,,.995, we reject Hy: HW, = HM, against 
Hy: # Hy; 0.383 < pw, - HM, < 1.871.) 


The 1.Q.’s of 16 students from one area of a city showed a mean of 107 with a 
standard deviation of 10, while the I.Q.’s of 14 students from another area of the city 
showed a mean of 115 with a standard deviation of 8. Is there a significant difference 
between the I.Q.’s of the two groups at (i) 0.01 and (ii) .0.05 level of significance ? 


{ (i) Since to:. 9095 = — 2.763 < t = —2.395 < 2.763 = fo. 9995, we do not reject 
Hy: dy = Hy against H,: M, * MH, at o = 0.01. 

(ii) Since t = — 2.395 < — 2.048 = typ. 9995, we reject Hy: M,; = Hz against 
Hy: f, # H,; at a = 0.05. } 

Means of random samples, each of size 10, from two normal populations with the same . 

standard deviation were found to be 16 and 20 respectively. Further, the sample 


standard deviations were equal to 5 and 7 respectively. Test the hypotheses that the 
populations have the same mean, using 0.05 level of significance. 


{ Since tg.o25 = — 2-101 < ¢ = — 1.47 < 2.101 = tig. 9975+ We donot reject 
Hy: Hy = H, against H;: H, # H,-} 

A random sample of size n, = 10, selected from a normal population has a mean 
X, = 20 anda standard deviation §, = 5. A second random sample of size n, = 12, 

selected from a different normal population has a mean x, = 24 and a standard 


. deviation $= 6.1f uw, = 22 and p, = 19 and o? and o} are unknown but 





Hypothesis Testing 157. 
approximately equal, test whether there is any reason to doubt that pi, — = 3, 
weer t = —2.934 < —2.086 = fo. 9.025, We reject Ho: Lt, — Bb = 3 in <a of < 

>My, - Hy #3 at a = 0.05. } 


10. (@) The 5 ates values are obtained by two different samples while sampling a normal 
population with = 3.0 and o = 0. 5, Using these data solve the towns a 


A 27 +39 260 2832 rs Onno 70 341 
BD. |.24n-281 03 OES Ses 7S Sms mn a 3.3 
(i) Combine the data into one set and test the hypothesis Hp: = 3.0: against 

H,: wb # 3.0. 


(ii) a ee that o is unknown test the hypothesis Hj: M, S te a 
: fy > L,-Use & = 0.05 in both cases. 

{ (i) oe Zoos = — 1.96 < z = 0.626 < 1.96 = 2975, we do not reject 
H,: = 3 against H,: b, * 3. (i) Since t = 0.694 < 1,734 = f1s.095, we do 
not reject Hy): WU, S Mp, against A: H,> Hz: } 

(6) A group of 12 students are found to have the following I.Q.’s: 
112, 109, 125, 113, 116, 131, 112, 123, 108, 113, 132, 128 

Is it reasonable to assume that these students have come from a large population whose 

mean I.Q.’s is 115? 

Another group of 10 students are found to have the following 1.Q.'s: 

117, 110, 106, 109, 116, 119, 107, 106, 105, 108 
Can we conclude that both the groups of students have come from the same population? 
{ (@) Since ty).9925 = —2.201 < t = 1.386 < 2.201 = t.9975, we do not reject 
Hy: = 115 against Hj: # 115 at a = 0.05. 
(ii) Since t = 2.608 > 2.086 = t9.9975, we reject Ho: HM, = Hy against 
H,: I) # Mz; at % = 0.05. } Ny Sal 
(c) Arandom sample of 16 values from a normal population gave a mean of .42 che and 

a sum of squared deviations from this mean as 135 (inches). Test the hypothesis that the 

mean in the population is 43.5 inches. 

Another random sample of 9 values from another normal population gave a mean of 

41.5 inches and a sum of squares of deviations from this mean as 128 (inches)*. Test the 

hypothesis that mean of first population equals the mean of the second population. 

assuming that the variances of the two populations are equal. | 

{ Since t15.9925 = — 2-131 < t = —2 < 2.131 = t)5.9975, we do not reject 
Hy: # = 43.5 against H,:  * 43.5.) 

Since 155. 9.025 = - 2.069 < t = 0.355 < 2.069 = 155.9975, we do not reject 


Hy: HW, = Hgaeenet at a # Ho} 


* 
bles T— 





158 . 7 Statistics — Part I 


13.5 INFERENCES ABOUT THE DIFFERENCE BETWEEN 
TWO POPULATION MEANS—DEPENDENT SAMPLES 


There are many situations that require matched-pair comparisons. This method is 
appropriate when the observations of two dependent normal populations are compared. For 
example, if we are making inferences about the difference between blood pressure before and 
after administering a drug to heart patients, the blood pressure data after the drug is taken is 
dependent of the blood pressure before the drug is taken. 


When matched samples are employed for making inference about the difference 
between two population means J, — j/,, it turns out that the procedures simplify to those for a 


single population mean jf. Suppose that we have an extreme and simplified form paired 
observations with pairing of before and after measurements. To analyse paired experiments, we 
consider for each pair the measurement before and after any conditions as variables X and Y, 
respectively. 
Suppose that X is a normal random variable with mean 1, and variance o7, i. ¢., 
X ~ N(,, 07); Y is a normal random variable with mean pl, and variance o2, i. e., 
-Y ~ N(u,, 05);X and Y are dependent. We wish to estimate 1. — Hy. 
We consider for each pair the difference between the random variables X and Y and 
denote this random variable by D, 
D=Y-xX 
We now consider the population of differences so obtained. We denote the mean of this 
population of differences by 1, and variance by 07, which are given by 
Hy = Hy - Hb, 
o2 = o3+07-2poa,a, 
A simple random sample of size n is selected from this population of differences. Let 
X, and Y, denote the before and after measurements respectively, for the i-th object in the 
random sample, so (X;, ¥,) is a matched pair of observations. 

Thus (X,, ¥), (X,, ¥%), °°", (X,, ¥,), is a random sample of n paired 
observations from the bivariate normal distribution with parameters given by uw, = E( X ), 
HM, = E(Y), o7 = Var(X), oF = Var Y), and p = Corr(X, Y) = Cov(X,¥)/(o,0,). 

The object is to make inferences about “4, = ff, — [,. To measure the change in 
measurements of the i-th object, we use the difference 


D, = Y, - X;, vik yoo, 


then D,, D,, -*-, D, are independently and identically distributed random variables 
with common normal distribution having mean jf, and variance 03 . 





Hypothesis Testing 159 


Now these differences D, = Y, -— X,, i = 1, 2, -++*,m may be thought of as a 


random sample of differences from a population of differences. We calculate the mean of the 
sample differences D,, D,,+**, D,,, denoted by D _ This statistic D is a random variable that 


has a sampling distribution with 1, by the earlier procedures for a single population mean, but 
we are really estimating Up = H, — H,, the difference between the means of the two 


populations. The random variable 


D- Up 


S/n 


follows a t-distribution with v = n — 1 degrees of freedom, where 


3p, 
D = i=] 
i 
>(D, -D)? ¥ D2'- nD? 
22 i=l iz] 
§? = - 





13.5.1 Test of Hypothesis about 4, = [,—H,- Like the construction of confidence 
intervals for Up = My — Hy,» testson Up = Hy — H, for the case of dependent (matched) 


samples parallel those for single population. Mean of the sample differences D correspond to X , 
mean of population of differences UL, = H, — H, corresponds to pf and GO, corresponds g. 


Small sample test of hypothesis Hy: Up = 5p is based on the test statistic 


— dy 


ie 


which has a ¢-distribution with v = n — 1 degrees of freedom. 


T = 


The summary of the procedure of testing a hypothesis about the difference between means of two 
normal populations with matched pairs of observations, at a given significance level a, for the 
three respective alternative hypotheses is shown in the following table. 








160 | at=, __ Statistics — Part I 


Testing difference between means of two normal populations, Dependent samples: 
Summary of procedure for 3 alternatives 


: => Choose alternative hypothesis 


Reject H, if | Reject H, if | Reject H, if 


Lo hg : 1a ae oe U <b a/2 or 


V = it- l P > by.1-o/2 





| Wereject H, if | |  Wérreject H, if | We reject H, if 
. PSA Fase eed Ss fear <ty.qjo OF 
otherwise we do | otherwise we do t> ty a/2 
ss 20 | ‘lads ese 19 otherwise we do 
| not reject to 


ian. To aa valid sical about the difference between means of two normal 


mis be me with matched pair of pbeetwarions (dependent samples) the following assumptions 
must be met. 





(@)  Arandom sample of n differences must bé selected from the population of differences 
(ii) The population of differences must be (approximately) normally distributed. 


The assumptions of an underlying normal distribution can be relaxed when the sample 
size is large. Applying the Central Limit Theorem to the differences D,,D,,-+*-,D, suggests a 


nearly normal distribution of (D — Lp) / (S D / jn ) when n is large ( say greater than 30). 


Example 13.24 A new weight reducing technique, consisting of a liquid protein dich as 
currently under going tests by the Food and Drug Administration (FDA) before its introduction 
into the market. A typical test performed by the FDA is the following. The weights of a random 
sample of 5 people are recorded before they are introduced to the liquid protein diet. The five 
individuals are then introduced to follow the liquid protein diet for 4% weeks. At the end of this 
period, their weights (in pounds) are again recorded, The results are listed in the table. 


Weight before | __1- | ! 
3190185 IST. 
Perform a test of hypothesis at the 5% level of significance if the mean weight is. smaller after 
the diet is used than before the diet is used. 
Solution. The mean and standard deviation of the sample differences are 
3 | Weight | Difference 
Before After | ° (after minus before) 
Dap ed 


















5 = 
wer: Ag DRE zn 
Xdj-nd* _ | 135-55)" _ 1.58 

n-l1l -- 5-1 


The experimenter believes that the mean weight after is smaller than before, then the mean “after 
minus before” difference 1, is less than zero. Hence [1p < 0 is the alternative hypothesis. The 
elements of the one-sided left tail test of hypothesis are: 


Null hypothesis Hj: Hp 2.9 
Alternative hypothesis Hy: Up < 9 











162 "7 7 | Statistics — Part Il 





Level of significance: a = 0.05 


=v5 | 

Test statistic: T= follows a t-distribution under H, with 
Sp/yn 

Degrees of freedom: van Site S51 = 4 

Critical value: Lig = ly-o05 = — 2-132 (From Table 12 ) 

Critical region: T < -2.132 

Decision rule: Reject H, if T < —2.132, otherwise do not reject Hp. 

| d-—6 —5- 

Observed value: f= eS = geen. = -—7.076 
Sp/fn  —-'1.58/, 5 

Conclusion: Since f = —7.076 < — 2.132, so we reject Hy. 


Example 13.25 Productivity (units produced per day ) for a random sample of 10 workers 
was recorded before and after training. The following paired observations were obtained. 


(Worker am manent | Ieee 2s Wa ee's “6 71-8) 9) 10] 
| Productivity before |54 56 50 52 55 52.56 53 53 60 
Productivity after. |60 59 57 56 56 58 62 55 54 64 
Perform a test of hypothesis at the 1 percent level of significance to determine if mean 
_ productivity is greater after training than before training. 

Solution. The mean and standard deviation of the sample differences are 


































Worker | Units produced per day Difference 
| Betore After (after minus before) 
i. 4 Yi a; = Yj - %; (d; -d) 
l 6 4 
2 3 l 
3 7h 9 
4. 4 0 
a) it 9 
6 6 4 
7 6 4 
8 2 4 
9 1 9 
10 al 





r\2 — 44 


a at 


, i ee 
To 


Hypothesis Testing. = | 163 


The experimenter believes that the mean productivity after is greater than befote, then the 
population mean “after minus before” difference up is greater than zero. Hence Up) > 0 is the 
alternative hypothesis. The elements of the one-sided right tail test of hypothesis are: 


Null hypothesis Hy: Hp = 0 
Alternative hypothesis H,: Up > 0 


Level of significance: a= 001 .> 1-—-a = 0.99 
Test statistic: T= —— > follows a #-distribution under H, with 
- Sp/fn 

Degrees of freedom: ven-12= 10-1=9 

Critical value: | tyer-ag = %-999 = 2.821 (From Table 12 ) 

Critical region: T > 2.821 

Decision rule: Reject H, if T > 2.821, otherwise do not reject Hp. 
d-6 4-0 

Observed value: t= — = ———_= 
sa/fn 2.21/10 

Conclusion: Since ¢ = 5.72 > 2.821, we reject H, 


Example 13.26 A company is interested in hiring a new secretary. Several: candidates are 
interviewed and the choice is narrowed to two possibilities.. The final choice will be based on 
typing ability. Six letters are randomly selected from the company’s file, and each candidate is 
required to type each one. The number of words typed per minute is recorded to each candidate: 
The data are listed i in the following table. : 


ener | 


Candidate A 

Candidate B 
Do the date provide sufficient evidence to indicate a difference in the mean number of words 
typed per minute by the two candidates. Test using a = 0.02. 
Solution. The mean and standard deviation of the sample differences are 

| Number of words typed by Difference | 
Candidate A Candidate B | (A minus B) 
i ad. 








164. Statistics — Part I 





“ n>d? —-(Sd,)? 6(58) — (16)? 
§p = peR Sean ses" = 1.751 
n(n — 1) 6(6 — 1) 


The experimenter believes that the mean typing rate differs for the candidate A arid B, the 
population mean “A minus B” difference , is not equal to zero. Hence up, # O is the 


alternative hypothesis. The elements of the two-sided test of wena are: 
Null hypothesis Hy: Hp = 9 
Alternative hypothesis Hi: Up # 0 


Level of significance: - © =0.02 => a/2=001 => 1-a/2 = 0.99 


Test statistic: T = LATS follows a t-distribution under H, with 
me Sp/yn 
Degrees of freedom: v=en-1 = 6-1 = § 
Critical values: tya/2 = 's:001 = — 3.365, 
tyt-ay2 = 's:099 = 3.365 ( From Table 12 ) 

Ghitical resion: T < —3.365 or T> 3.365 | 
Decision rule: | Reject H, if T < -3.365 or T > 3.365, otherwise do not reject Hy. 
Observed value: = ie tO = ee = 3731 2 

ce “alla 1.15i/f6 
Conclusion: Since t = 3.731 > 3.365, so ‘we reject H,. } 


Example 13.27 An experiment was performed with ‘séven hop plants. One half of each plant 
was pollinated and the other half was not pollinated. The yield of the seed ud each hop plant is B2 


poner Pollinated Non- pollinated 


tabulated as Jollows, 





—  — _—— = —- 
* 





: 
. 


Hypothesis Testing : | ae = 165 


Determine at the 5% level whether the pollinated half of the plant gives a higher yield in seed 
than the non-pollinated half. State the assumptions and Mypore as: to be tested and carry rogsh: 
the computations to make a decision | . 


Solution. The mean and standard deviation of the sample differences are 


Yield in seed ‘Difference — 
Pollinated § Non-pollinated (Pollinated minus Non-pollinated) 





n . 7 . KY, very 
7 Fa -\2 > \2 : 
nid; —(id;) _ | _7(1,9008) — (3.44)? = ESD 
n(n =1) 7(7 = 1) Se 
The experimenter believes that the pollinated half gives a higher mean yield in seed, 
then the alternative hypothesis is #, >: 0. The elements of the one-sided sue tail test of 
hypothesis are: 
Null hypothesis Hy: Up -S 9 
Alternative hypothesis H,: Up > 0 


Level of significance: a = 0.05 = l-a@ = 095 


Test statistic: T= Di io follows a t-distribution under H, with 
Sie | 

Degrees of freedom: ven-l=2= 7-1 = 6 

Critical values: Ht = '6:095 = 1.943 (From Table 12) 

Critical region: T > 1.943 . 

Decision rule: Reject H, if T > 1.943, oiheraiae do not reject Hy.. 


d-5 0491-0 


Ob. | : = = =O 
served value t dn aiea 01872), 7 Tr 939 


Conclusion: Since t = 6.939 > 1.943, so we reject H,, and conclude that the 


data provides a sufficient evidence that pollination gives a higher mean 
yield in seed. . 


166 Statistics — Part Il 








Exercise . 13.5 


1. (2) Haemoglobin values were determined on six patients before starting and after three 
weeks on B,, Therapy. The following data were obtained 


Individual | Haemoglobin (gm) Haemoglobin (gm) 
Number Before Therapy After Therapy 


l 











Do the data indicate a significant improvement ? 
(Since t = 5.927 > 2.015 = f5.995, we reject Hy: Up S O in favour of 


Hy: Up > 9) 


(6) Eleven school boys were given a test in Drawing. They were given one month's further 
tuition and a second test of equal difficulty was held at the end of it. 


Marks in Ist test DSmeeUio 2k 18 20; 18 “17 23 16 19 
Marks in 2nd test Ame lois 20) 22’ 20° 20 20 20 STi 
Do the marks give evidence that the students have benefited by the extra coaching? 
(Since ¢ = 0.956 < 1.812 = t.995, so we do not reject Hy: Up <= 9 against 
Hi: Up > 0) 
2. (a) A taxi company is trying to decide whether the use of radial tires instead of regular 
belted tires improves fuel economy. Twelve cars were equipped with radial tires and 
driven over a prescribed test course. Without changing drivers, the same cars were then 
equipped with regular belted tires and driven once again over the test course. The 
gasoline consumption in km per litre, was recorded as follows: 
Radial tires 42 47 66 70 67 45 57 60 74 49 61 5.2 
Beltedtires | 41 49 62 69 68 44 5.7 58 69 47 60 49 


At the 0.025 level of significance, can we conclude, that cars equipped with radial tires 


give better fuel economy than those equipped with belted tires ? Assume the population 
to be normally distributed: 
(Since t = 2.490 > 220] = tiso975» We reject Hy: Up S O in favour of 


Hi: Up > 0) 


increase in blood pressure: 
5, l, | 8, 0, : 3, 3, 5, iy 2, 4 


(6) A certain stimulus administered to each of the nine patients resulted in the following 








Hypothesis Testing - 
et ES 


and 72. Afte 


(i) 


>My 3 0 

Al: Un >0 

(i) Since ¢ = 0.863 < 1.717 2: 0.95» We do not reject Ay: pL, 
against A? My — Uw, > 0 }. . 


(6) The time required by 10 Petsons to perform a task j 
4 mild stimulant are given j 





» assumed that the after Population w 
lower mean Use 5% level of Signifi 
(Since ¢ = _ 2.83 < ~ 1.333 — '9: 0.05» We reject A, : Hy 2 O ing 

H;: Hy < 0.) 


4. (a) Ten young recruits were put through a physical training pro 
Weights were recorded be 


gramme by the arm 
fore and after the training with the follo 


Wing results: 
Recruit a IS 
Weight (Before) | 197 126 162 170. 143 205 168 175 © 497 
Weight (After) 135 200 160 182 147 200 172 186 193 
Using a = 0.05, should we conclude that the Taining Programme affects the = 
Weight of young recruits. | | | 
( Since '9:0.025 = —2,.262 < f= 1471 < 2960 ~ '9:0.975» We do not 
ly: Up = 0 against i? Uy # 0) 
) (6) The following data ive paired yi 


elds of two varieties of wheat Each pair was plan 
a different locality. 


Variety | 45 32 
Variety II 47 








168 : Statistics — Part Il 
Test the hypothesis that the mean yields are equal. 

(Since ¢ = 6.725 > 2.262 = fy.9975, we reject Hy: Mp = O in favour 
of H,: Up # 9.) 

5. (a) Twenty college freshmen were divided into 10 pairs, each member of the pair having 
approximately the same I.Q. One of each pair was selected at random and assigned to a 
Mathematics section using programmed materials only. The other members of each pair 
were assigned to a section in which the teacher lectured. At the end of the semester each 
group was given the same examination and the following results were recorded: 

Programmed materials Lectures 

a 76 

2 70 

3 85 

4 38 

5 9] 

6 75 

7 82 

8 64 

9 79 

10 88 
Test the hypothesis that there is no difference between the mean scores in the 
“programmed material” and “lectures” populations. Use 5% level of significance. 
(Since fo.9925 = — 2.262 < t = 0.571 < 2.262 = 1.9975, we do not reject 

Hy: Hp = 0 against H,: Up # 9.) 

(6) It is claimed that a new diet will reduce a person's weight by at least 10 pounds on the 


average in a period of 2 weeks. The weight of 7 women who followed this diet were 
recorded before and after a 2-week period. 


Woman | 2 | 42 5 6 1 7 
Weight (Before) | 129 133. 136 152 141 138 ‘125 
Weight (After) 160) m2 14) 128) 137129, «13247 __120 


Test the manufacturer's claim at a 5% level of significance for the mean difference in 
weights. Assume the distribution of weights before and after to be approximately 
normal. = 


(Since ¢ = 0.910 < 1.943 = 16.995, we do not reject Ho: Hp S — 10 against 
H,: Up > -10) ef 


rs 


Hypothesis Testing | | 169 


13.6 TEST OF HYPOTHESIS ABOUT THE DIFFERENCE 
BETWEEN TWO POPULATION PROPORTIONS, x, — 7, 


There are many economic and management problems where we must decide whether 
observed differences between two sample proportions come from common or different 
populations. Such problems require a comparison between the rates of incidence of a 
characteristic in two populations. 

13.6.1 Forms of Hypothesis. We are interested in tests about the parameter m, — 1,. Let 5, 
be the hypothesized value of the difference between two population proportions, then the three 
possible null hypotheses about the difference between population proportions, and their 
corresponding alternative hypotheses, are: 

2. Hy: %,-—, S dy against H,:%,—-T%, > 6, 

3. Hy: %,—, = by against H,;:™,-m, # 5, 

Depending upon the alternative hypothesis a one-sided left tail test is required for (1 ); 
(2) requires a one-sided right tail test, and (3) requires a two-tail test. 

13.6.2 Test based on Normal Distributions. The unknown proportion of elements possessing 
the particular characteristic in population I and in population II are denoted by n, and z,, 
respectively. A random sample of size n, is taken from population I and the number of successes 
is denoted by X, . An independent random sample of size n, is taken from population II and the 
number of successes is denoted by X, . The sample proportions are: 

X X 

P, ies and FE, = —2. 
44 Lo 

An intuitively appealing estimator for ™, — 1, is the difference between the sample proportions 
P, — FP,. When testing hypothesis about 7, — 7,, we will use the sampling distribution of 
P, — P,. The sampling distribution of PF, — P, will have mean and standard error as 


Hp= py Sek ark CP =P = 





For large sample sizes n, and n,, the random variable 


7 - —(A-=h)-(m-™) 
T, (1— 7, ) , MmU=z) 
ny ny 





is approximately standard normal. The estimate of the standard error of P, — P, can be obtained 
by replacing m, and 7, by their sample estimates PF, and P, as 


Q> 





J Gs EvE Ee Ua): 


A-* ny Ny . 


170 Statistics — Part Il 


The random variable Z then becomes 
(Ff — P,) — (m - ®) 
PU-F) + P,(1— P,) 
my no 


(a) Testing the Hypothesis that the Difference between Two Population Proportions 
Equals Some Non-zero Value. To test the more general hypothesis H): m, — ™, = 5,, we use 


the test statistic 

(A -P,) - 6 

P(U- FA) + P,(1— P,) 

a no 
(6) Testing the Hypothesis that Two Independent Populations have Same Proportion of 
Successes. Our aim is to test-the null hypothesis of no difference H,: m, = ™,. Under the null 
hypothesis H,: ™, — %, = 0, the two populations have equal proportions m, = ™,, we denote 
the unspecified common population proportion by 7. 
Since under the null hypothesis it is assumed that x, = ™, = 1, then the sampling distribution of 
P, — P, will have mean 

Hp, -p, =“t-f = 0 

and standard error 


HED) 7 ee Pn ee 
ny Nn» ny Ny 


The mnences population proportion 2 involved in the standard error must now be aay by 
the sample proportion. 

The difficulty in making this replacement lies in the fact that we have two estimates of =, P, and 
P, , since two different samples were collected. Which of these two estimates should be used to 
estimate the unknown population proportion 7. 


Sp, -P, 


Since we wish to obtain the best estimate available, it would seem reasonable to use an estimator 
that would pool the information from the two samples. 


The proportion of successes in the combined sample provides the pooled estimate. 
ao X,+X, #=4nP +n,P, 
The estimate of standard error then becomes 


] 


l 
Grp = | R(1-—%)) —+— 


Hypothesis Testing | we | f | 


For large sample sizes n, and n,, the random variable 


f, — F, 


n(1— es + oh. 
ny no 


is approximately standard normal under the null hypothesis H,: m, — ™, = 0. 


Z-= 


The summary of the procedure of testing a hypothesis that the difference between two population 
proportions equals some none-zero value, at a given significance level a, for the three respective 
alternative hypotheses is shown in the following table. 


Testing that the Difference between Two Population Proportions Equals Some 
None-zero Value. Summary of procedure for 3 alternatives 


Choose alternative hypothesis 





Set level of significance: o 


____(R-B) - & 
Fd=F)  2a-F) | 
n n, ~ 


Reject H, if 
Z<%, 


Reject H, if 
Z< Za/2 or 






Z> a2 


Calculate the observed value of test statistic 


(Pp, — P2) - do 
P,(1— p,) in p,(1—- p2) 
ny nm, 





We reject H, if We reject Ho if 







Z< 2%, Z>Z_¢ 
otherwise we do otherwise we do 
not reject H, not reject H, 





i” + wet 





172 | Statistics — Part I 





The summary of the procedure of testing a hypothesis that two independent populations have 


Same proportion of successes, at a given significance level a, for the three respective alternative 
hypotheses is shown in the following table. 


Testing that Two Independent Populations have Same Proportion of Successes: 
Summary of procedure for 3 alternatives 


Choose alternative hypothesis 


H,: %, —%, > 0 





Set level of significance: & 





Reject H, if 
Z<2Z, 








Reject Ho if 


Z > Z1_a/2 








= 212% MP +H p, 





We reject H, if 


2? 21 _ oo 
otherwise we do 
not reject Hy | 


Hypothesis Testing 173 


Example 13.28 A cigarette manufacturing firm distributes two brands of cigarettes. It is found 
that 56 of 200 smokers prefer brand “A” and that 29 of 150 smokers prefer brand “B”. 
. Test at 0.06 level of significance that brand “A” outsell brand “B” by at least 10% against 
the alternative hypothesis that the difference is less than 10%. 


Solution. We have n, = 200, x, = 56, P, = is Sel 0.28 
ny 200 

x 29 . 

n, = 150, x50 = 29; pp = — = — = 0.193 
nN» 150 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the difference between the two proportions of smokers m, — 7%, is less than 0.1. 


Consequently, we will test the null hypothesis that x, — m, 2 0.1 against the alternative 


The elements of the one-sided left tail test of hypothesis are: 
Null hypothesis Hy: %,-—%, 2 O.1 (i.e, 5, 2 0.1) 
Alternative hypothesis H,:%,-—%, < O.1 (ie, dg < 0.1) 


Level of significance: a = 0.06 


Test statistic: Z= Gee ea follows approximately 
| P(1-#F) 7 P,(1-—F,.) standard normal distribution 

Critical value: Z, = 2g = —1-:555 ° { From Table 10 (a) } 
Critical region: Z < -—1.555 
Decision rule: Reject H, if Z < — 1.555, otherwise do not reject Ho. 

| =p ies 
Observed value: z= (Pi = Pa) - 

p,(i- p,) p,(1— p2) 

ny meu) 
a (0.28 — 0.193) — 0.10 = _0287 
0.28 (1 — 0.28) x 0.193 (1 — 0.193) 
200 150 

Conclusion: | Since z = — 0.287 > - 1.555 = Zo, we do not reject 


H,: %, - %, 2 0.10 and conclude that the data present sufficient 
evidence to indicate that brand A outsells brand B by atleast 10%. 


Example 13.29 A firm found with the help of a sample survey (size of sample 900) that 0.75 
of the population consumes things produced by them. The firm then advertised the goods in paper 


174 bes | 2 _ Statistics — Part I 





goods Doieed SS the at is now y 0. &. Is this rise significant indicating that the advertisement 
was effective? 


Solution. We have | n, = 900, p, = 0.75, n, = 1000, p, = 0.80 
t= mi P; + Ny Po 
n, +n, 


900 (0.75) + 1000 (0.80) 
900 + 1000 


= 0.776 


The objective of the sampling is to attempt to support the research ( alternative ) hypothesis 
that the proportion of consumers after advertisement x, is greater than that of before 7,. 
Consequently, we will test the null hypothesis that 7, < 7, against the alternative hypothesis 
that %, > 7,. 

The elements of the one-sided right tail test of hypothesis are: 

Null hypothesis Ho: %) S% = ®,-%, S O(ie, 5d, < 0) 


Alternative hypothesis H,:%, >, = ,-, > O(i.e, 5, > 0) 





Level of significance: a = 0.05 => l-a = 0.95 

Test statistic: Z) = follows approximately 
standard normal distribution 
under H, 

Critical value: Zi-a = 2% 95 = 1.645 { From Table 10 (b) } 

Critical region: Z> 1.645 


Decision rule: 











Observed value: 
- CO8075) - 0 es EGY T 
1 l 
| 0.776(1 — 0.776)| —— + | 
0-070) 5 * or 
Conclusion: : Since z = 2.61 > 1.645 = z),., we reject H, and conclude that the 


data present sufficient evidence to indicate the coosumpuce after 
advertisement is paghes 0 that of before. ’ 





Hypothesis Testing 





1. @) 


(6) 


2. @) 


_Exercise _ 13.6 





For each of the following sets of data, test the hypothesis that there is a common 
proportion 7 











Level of 
size Guscenes significance 









(i) 150 

(ii) 1000 542 1% 

(iii) 100 

(iv) 

{(@) Since z = —1.245 > -1.645 = Z,,, we do not reject Hp: %, 2 fy 
against H,:-%, < 1,. 

(ii) Since z = 2.941 > 2.576 = Zo95, we reject Hy: %, = %, in favour 
of H,: tm, # T,. 

(iii) Since Zoos = — 1.645 < z = —0.573 < 1.645 = 2% ,, we do not reject. 


Hy: %, = TM, against H;: %, # 7%. 


(iv) Since z = 1.373 < 1.645 = Zo, we do not reject Hj: %, S %, against 
H,: 1, > -} 


In a population that have certain minor blood disorder, samples of 100 males and 100 
females are taken. It is found that 31 males and 24 females have the blood disorder. 
Can we conclude at 0.01 level of significance that proportion of men who have blood 
disorder is greater than proportion of women? | 

{ Since z = 1.109 < 2.326 = Zo), we do not reject Ho: % S % against 


Hi: ™%, > f,-}) 
The records of a hospital show that 52 men ina sample of 1000 men versus 23 
women in a sample of 1000 women were admitted because of heart disease. Do these 
data present sufficient evidence to indicate a higher rate of heart disease among men 
admitted to the hospital? | 
{ Since z = 3.413 > 1645 = Zo o5, we reject Hy: %, S %, in favour of 
He Reo: T.. } : 


ee 


176 Statistics — Part I 


(6) In a study to estimate the proportion of housewives who own an automatic dryer, it is 
found that 63 of 100 urban residents have a dryer and 59 of 125 suburban residents 
own a dryer. Is there a significant difference between the proportions of urban and 
suburban housewives who own an automatic dryer? ( use a 0.04 level of significance ). 
{ Since z = 2.364 > 2.0537 = 293, So we reject H): m, = 1, in favour of 


+ Hy: %, # &,.} 


3. (a) Arandom sample of 150 light bulbs manufactured by a firm A showed 12 defective 
bulbs while a random sample of 100 light bulbs manufactured by another firm B 
showed 4 defective bulbs. Is there a significant difference between the proportions 
defectives of the two firms? 

{ Since Z 9, = — 1.960 < z = 1.270 < 1.960 = z,,,, we do not reject 


Hy: %, = ™, against H,: m, # 7,.} 


(5) In a random sample of 800 adults from the population of a large city 600 are found to 
be smokers. In a random sample, of 1000 adults from another large city, 700 are 
smokers. Do the data indicate that the cities are significantly different with respect to the . 
prevalence of smoking among men? 


{ Since z = 2.353 > 1.960 = Zo ons» We reject Hj: m, = 1, in favour of 
H\: %, # %.} 


(c) Intwo large populations, there are 30% and 25% respectively of fair haired people. Is 
this difference likely to be hidden in a sample of 1200 and 900 respectively from the 
two populations? 

{ Since z = 2.529 > 1.960 = 2.5, we reject Hj: ™, = 1, in favour of 


H\:%, # m, at a = 0.05. } 


4. (a) A machine puts out 16 imperfect articles in a sample of 500. After the machine is 
t overhauled, it puts out 3 imperfect article in a batch of 100. Has the machine been 
improved? Use 5% level of significance?’ 
(Since = = —0,104 > - 1.645 =. z,,, we do not reject Hy: m, 2 1, against 


H,: %, <%,) 


(6) Ina random’sample of 250 persons who skipped breakfast 102 reported that they 

experienced mid morning fatigue, and in a random sample of 250 persons who ate 

| breakfast, 73 reported that they experienced mid morning fatigue. Use 0.01 level of 

| Significance to test the null hypothesis that there is no difference between the 

) corresponding proportions against the alternative hypothesis that the mid morming 
fatigue is more prevalent among persons who skip breakfast. 

( Since z = 2.719 > 2.326 = Zo, we reject Hj: ™, S 7, in favour of 


5. (@) A manufacturer of house-dresses sent out advertising by mail. He sent samples of 
material to each of two groups of 1000 women. For one group he enclosed a white 
return envelope and for the other group, a blue envelope. He received orders from 9% 
and 12% respectively. Is it quite certain that the blue envelope will help sales. 


Hypothesis Testing ) . 177 


(5) 


Use a = 0.05. 
(Since z = 2.19 > 1.645 = z99,, wereject Hy: ©, S m, in favourof H,: nm, > T,) 


A, random sample of 150 high school students was asked whether they would turn to 
their father or their mother for help with a homework assignment in Mathematics and 
another random sample of 150 high school students was asked the same question with 
regard to a homework assignment in English. Use the result shown in the following table 
and the 0.01 level of significance to test whether or not there is a difference between 
the true proportions of high school students who turn to their fathers rather than their 
mothers for help in these two subjects: 


Mathematics English 
Mother: 59 85 
_ Father: 91 65 
(Since z = 3.016 > 2.576 = Zgo9,, we reject Hy:m, = ©, in favour of 
H,:%, # T) 
Exercise 13.7 
Objective Questions 
Fill in the blanks. 
(i) A statistical —————— is’an assertion about the distribution 
of one or more random variables. (hypothesis) 
(ii) A_ Statistical hypothesis ————— is a procedure to 


determine whether or not an assumption about some 
parameter of population is supported by an observed random 


sample. (testing) 
(iii) A ————— hypothesis is that hypothesis which is tested _ 

for possible rejection under the assumption that it is true. (null 
(iv) “An ————— hypothesis is that hypothesis which we are 

willing to accept when the null hypothesis is rejected. (alternative) 
(v) |The ————— hypothesis always contains some form of an | 

equality sign. | (null) 
(vi) |©The-—————— hypothesis never contains the sign of equality 


and is always in an inequality from. (alternative) 


178 ae. | Statistics — Part I 


(vii) A statistic on the basis of which a decision is made about the 


hypotheses of interest is called ————. (test statistic) 
(viii) A ————— region specifies a set of values of the test 
Statistic for which the H, is rejected. (rejection) 


(ix) An ————— region specifies a set of values of the test 
4 statistic for which the H, is not rejected. 7 (accepting) 


_ (x) The values of the test statistic which separate the rejection 
region from acceptance region are called : 
values, (critical) 


(xi) If the critical region is located equally in both tails of 
} the sampling distribution of test statistic, the test is called 
test. 3 (two-tailed) 


(xii) If the critical region’ is located in only one tail of the 
sampling distribution of test statistic, the test is called 
test. | (one-tailed) 


Diy Fill in the blanks. 


(i) A value of the test statistic is said to be statistically 
-if it falls in the acceptance region. (insignificant) 





(ii) A value of the test-statistic is said to be statistically 
significant if it falls in the ————— region. (critical) 


(iii) if the null be pothesis a is false, Wwe may accept it leading toa 
: decision. at | (wrong) 


OD). if pie null hypothesis is false, we may reject it leading toa 


decision.: — : > (correct) 
?) A. —————— ae is made by rejecting Hp) if H, is actually ~ 
Tes 0 Eee! e | } Fu : (Type-l) 
(vi)’ A ———— error is made by accepting Hy if -H, is | 
' actually true. - , ha (Type-Il) 
| (vii) The level of ————— of a test is the maximum probability 3 


with which we are willing to a risk of Type-I error. (significance) 





= aa) . ai 





Hypothesis Testing 


(viii) The level of ————— 


is the probability of accepting a true 


null hypothesis. 

(ix) ©The ————— of freedom is the number of independent or 
freely chosen variables. 

3. Mark off the following statements as True or False. 

(i) The types of statistical inferences are estimation of 
parameters and testing of hypotheses. 

(ii) |Anull hypothesis is rejected when a test statistic has a 
value that is not consistent with the null hypothesis. 

(iii) The probability of accepting the true null hypothesis is 
called the level of significance. 

(iv) The probability of rejecting the true null hypothesis is 
called the level of confidence. 

(v) The probability of accepting the true null bec is 
called the level of confidence. 

(vi) An assumption made about the population parameter which 
may or may not be true is called statistical hypothesis. 

(vii) The null hypothesis and alternative hypothesis are 
complementary to each other. 

(viii) The null hypothesis H, always contain some from of an 
equality sign. 

(ix) The alternative H, never contains the sign of equality. 

4. Mark off the following statements as True or False. 

(i) The level of significance is the probability of accepting a 
null hypothesis when it is true. 

(ii) A null hypothesis is rejected if the value of test-statistic is 
consistent with the H). 

(iii) |The Type-I error is considered more serious than a Type-Il 
error. 

(iv) A value of the test-statistic is said to be statistically 


insignificant if it falls in the rejection region. 


179 


(confidence) 


(degrees) 


(true) 


(true) 
(false) 
(false) 


(true) 


(true) 
(true) 


(true) 


(true) 


(false) 


(false) 


(true) 


(false) 


180 
(v) 


(vi) 


(vii) 


(viil) 


(ix) 


(xi) 





Statistics — Part It 


A value of the test statistic is said to be statistically 
Significant if it falls in the rejection region. 


If the critical region is located in only one tail of the 
sampling distribution of the test-statistic, then it is called one 
tailed or one-sided test. 


If the critical region is equally located in both tails of the 
sampling distribution of the test-statistic, then it is called a 
two-tailed or two sided test. 


The degrees of freedom is the number of dependent 
variables, 


The t-distribution approaches the normal distribution as the 
Sample size increases. 


The standardized normal distribution has smaller dispersion 
than student's ?-distribution. 


Ho is.rejected when probability of its occurrence is equal to 
or less than level of significance. 


(true) 


(true) 


(true) 


(false) 


(true) 


(true) 


(true) 





SIMPLE LINEAR 
REGRESSION 
AND CORRELATION _ 





14.1 RELATIONS BETWEEN VARIABLES 


The concept of a relation between two variables such as family incomes and family 
expenditures for housing, is a familiar one. We now distinguish between a functional relation 
and a statistical relation, and consider each of them in turn. 


14.1.1 Functional Relation between Two Variables. A functional relation’ between two 
variables, is a perfect relation, where the value of the dependent variable is uniquely determined 
from the value of the independent variable. A functional relation is expressed by a mathematical 
formula. If x is the independent variable and y is the dependent variable, a functional relation 
is of the form 


y = f(x) 


Given a particular value of .x, the function f(x ) gives the corresponding value of y. The 
observations, when plotted on a graph, all fall directly on the line or curve of the functional 
relationship. This is the main characteristic of all functional relationships. 


14.1.2 Statistical Relation between Two Variables. A statistical relation is a relation where 
the value of the dependent variable is not uniquely determined when the level of the independent 
variable is specified. A statistical relation, unlike a functional relation, is not exact. The value of 
y is not uniquely determined from knowledge of x. The observations, when plotted on a graph, 
do not fall directly on the line or curve of the relationship, This is the main characteristic of all 
statistical relationships. 


In many fields such as business, economics and administration exact relations are not 
generally observed among the variables, but rather statistical relationships prevail. For example: 
(i) The grade point Y secured by a student in the college is undoubtedly related to his grade 
point x secured in the school. (ii) The consumption expenditure Y of a household is related to 
its income x. (iii) The maintenance cost Y per year for an automobile is related to his age x. 
(iv) The yield Y of wheat is related to the quantity x ofa fertilizer, (v) The amount of sales Y 
of a newly produced item may be related to its advertising cost x. (vi) The weight Y of a baby 
is certainly related to his age x. (vii) The saving Y of a person or a firm is related to his/its 
income, (viii) The height Y of a son is undoubtedly related to the height x of his father, etc. 


Causal Relation. Another factor to consider is whether a causal relationship exists between two 
variables. In the example of steel output and labour input, it is clear that a causal relationship 
does exist. The number of workers will influence the number of tons of steel produced. There is 
also a causal relationship between hours of sunshine and the rate of growth of tulips, Conversely, 
it is less clear that more steel output will cause a rise in the number of workers, nor will we make 
the sunshine more by forcing tulips to grow faster. It is important to note regression analysis and 
correlation analysis make no assertions about causality. 


181 


182 Statistics — Part Il 


14.2 REGRESSION ANALYSIS 


One of the most common and important tasks that statisticians must face is to determine 
the existence and nature of relationships between variables in a problem. Weare interested in 
relationships between variables because we may often possess information about some variables 
and wish to use that information to draw conclusions about another variable. In many situations, 
we face the problems that involve two or more variables and we are to make inferences about 
how the changes in one variable are related to the changes in other variables, and how one set of 
variables is considered to predict or account for the other variable. These problems can be dealt 
with measuring statistical relationships between vanables, representing the relationships in 
mathematical ( functional ) form and evaluating the significance of the relationships. 


The regression analysis provides a method of estimating an average relationship ( often 
linear ) between two or more variables, which allows the investigator to explain and predict and 
this is, in a sense, the best possible approximation. The regression analysis provides an equation 
that can be used for estimating the average value of one variable from given values of other 
variables, 


14.2.1 Simple Regression. The simple regression is a relationship that describes the 
dependence of the expected value of the dependent random variable for a given value of the 
independent non-random variable. In statistical relationships, if only two values are involved: 


Regressor. The variable, that forms the basis of estimation or prediction, is called the regressor. 
It is also called as the predictor variable or independent variable or controlled variable or 
explanatory variable. It is usually denoted by x. 


Regressand. The variable, whose resulting value depends upon the selected value of the 
independent variable, is called the regressand. It is also called as the response variable or the 
predictand variable or dependent vanable or explained variable. It is usually denoted by Y. 


The values of the independent variable x are determined by the experimenter and they 
are fixed in advance. They are arbitrarily selected constants and thus have no error attached with 
them. The independent variable is not random but a mathematical variable and we can choose the 
values we give to it. On the other hand, however, the problem is usually complicated by the fact 
that the dependent variable is subject to experimental variation or scatter. Besides depending 
upon the regressor variable, there is a random error in determining the response variable. Thus the 
response variable possesses a random character, it is left free to take on any value that may be 
possibly associated to a given value of the independent variable. 


Let Y be the response variable and x be the regressor variable and 1 by, = E(Y | ‘x) be 


the expected value of the distribution of the random variable Y for a given value of the non- 
random variable x, then the simple regression is given by 


where f(x) is a function that describes the relationship between the regressor x and the 
response Y and f(x) may be of linear, quadratic, exponential, geometric, or any other form. 
14.2.2 Regression Function. When we look for a relationship His = fi x), where the 


function f(x) is to be determined, /. e., given the points only we have to ‘work backwards' or 
regress to the original function f(x ). Hence this function is called regression function. 


Simple Linear Regression and Correlation 7 a 183 


14.2.3. Regression Curve. The regression curve is the locus (a continuous set of points) of 
the expected value of the response variable for given values of the regressor variable. If several 
measurements are made on the response variable Y at the same value of the regressor variable x, 
then the results will form a distribution. The curve which joins the expected: values of these 
distributions for different values of x is called the simple regression curve of Y on x. 


14.3 CURVE FITTING 


Curve fitting is a process of estimating, from an observed sample, the parameters of the 
population regression function of a response variable on a regressor variable. 


14.3.1 Least Squares Principle. The principle of least squares says that the sum of squares of 
the residuals of observed values from their corresponding estimated values should be the least 
possible. This principle was given by a French mathematician Adrien Legendre. 


14.3.2 Least Squares Fit. Among all the curves approximating a given data, the curve is 
called a least squares fit for which the sum of squares of the residuals of the oosexyed values 
from their corresponding estimated values is the least. 


For a given set of observed data, different curves have different values of the sum of 
squares of the residuals. The best fitting curve is the one having the smallest possible value of the 
sum of squares of the residuals. To avoid the personal bias in fitting a curve to observed data, the 
method of least squares is used. 


14.3.3 Scatter Diagram. The scatter diagram is a set of points in a rectangular co-ordinate 
system (with x measured horizontally and y measured vertically), where each point represents 
an observed pair of values. To aid in determining an equation connecting the two variables, a first 
step is the collection of the data showing the paired values of the variables under consideration. 


Let us suppose that 7 pairs of observations (x,, y,), (*,, ¥,),°**, (4%. Y,,) are 


made on two variables. The next step in the investigation is to plot the data on a graph to get a 
scatter diagram. The choice of the regression curve to fit may be influenced by theory, by 
experience or simply by looking at the scatter diagram. For example, the experiment may have 
been designed to verify a particular relationship between the variables. Alternately, the function _ 
form may be selected after inspecting the scatter diagram, as it would be useless to try to fit a 
straight line to some data if the relationship was clearly curvilinear. In practice the experimenter 
may choose the one which gives the best fit. | 


It is often possible to see, by looking at the scatter diagram that a smooth curve canbe | 
fitted to the data. In particular, if a straight line can be fitted to the data, then we say that a linear 
relationship exists between two variables, otherwise the relationship is curvilinear. A visual 
examination of a scatter diagram gives some useful indications of the nature and strength of the 
relationship between two variables and aids in choosing the appropriate type of model for 
estimation. 


For example, if the points on the scatter diagram tend to run from the lower left side to 
the upper right side (that is, ifthe Y variable tends to increase as x increases), there is said to be 
a direct relationship between the two variables. On the other hand if the points on the scatter 
diagram tend to run from the upper left side to the lower right side (that is, if the variable Y tends 
to decrease as x increases), there is said to be inverse relationship between the two variables. 
The scatter diagram gives an indication whether a straight line appears to be an adequate 


134s Statistics — Part Il 


description of the average relationship between two variables. If a straight line is used to describe 
the average relationship between two variables, a linear relationship is said to exist. If the points 
on the scatter diagram appear to lie along a curve, a curvilinear relationship is said to be present. 


x 








: x ees 
0 
(e) No Apparent Relationship (f) Direct Linear Relationship 
: with wider scatter than in (a) 


Fig. 14.1 Types of relationships found in scatter diagrams 


Parts (a), (6), (c) and (d) of Fig 14.1 show direct linear, inverse linear, direct 
curvilinear and inverse curvilinear relationships. The points tend to follow a straight line with 
positive slope in (a ), a straight line with negative slope in ( b ), a curve with positive slope in 
( c ), and a curve with negative slope in ( d ). Of course the relationships are not always so 
obvious. 


In (e) the points appear to follow a horizontal line. This type of scatter diagram 
depicts “no correlation” or no evident relationship between x and Y variables because the 
horizontal line implies no change, on the average, as x increases. In (f ) the points follow a 
straight line with positive slope as in (a) but there is a much wider scatter of points around the 
line than in (a). 


Note that a scatter diagram is primarily used to determine the appropriateness of a 
particular type of equation for describing the data. The approximate “goodness of fit” of the 
equation is also apparent from a scatter diagram, for example the fit in (a) is quite good as 
conrer to the fitin (f). However, “goodness of fit” can and should be defined and determined 
precisely. 










Simple Linear Regression and Correlation 





14.4. SIMPLE LINEAR REGRESSION 
If the simple regression describes the dependence of the €xpected value of the dependent 


random variable y as a linear function of the independent non-random variable x, then the 
regression is called simple linear regression. It is given by 


By) x = &+ Bx 
Which implies that q = Hy), when x = 0. Thus o is the intercept of the line along y-axis. 


The B indicates the Change in the mean of the probability distribution of Y per unit increase 


14.4.1 Simple Linear Regression Coefficient. The simple linear regression coefficient is the 
relative change in the expected value of the dependent random variable with respect to a unit 
increase in the Independent non-random variable. It is denoted by £. The Slope of the line B 
remains constant at each value of x. 


It is measured by tan @ y 
Where @ is the angle made by the line 
with the positive Side of the x-axjs. 
The slope of the line depends upon the 
value of the B. If the value of B is 
Positive, the line will Slope upward 
like the solid line in the Fig. 14.2 On 
the other hand, if the value of B is 
negative, the line wil] Slope downward | 
~ like the broken line in the Fig, 14.2. 0 





Fig. 14.2 Simple linear regression 
14.5 THE SIMPLE LINEAR REGRESSION MODEL 


know that as one’s income increases, there is a tendency to spend more. What kind of relation is 
there between income and expenditure? Is it Proportional, or is there any other form of a 
relationship, how close this relationship b {ween income and expenditure is? Certainly there is no 
functional relationship between disposable income and Consumption expenditure. Now let Y 
denote the consumption expenditure and x denote the disposable income. 


Let us suppose that we have already divided our households into various groups on the 
basis of income levels. We do not expect that all the households within the froup which have 
some given (fixed, predetermined) income x will display an identical €xpenditure. Some will 
spend more than the others, some will spend less, but we do ©xpect a clustering of the €xpenditure 
figures around a central value with some variance, For each possible value of x chosen 
non-randomly there are several values of Y that could occur. Thus Y becomes a random 





186 Ca - Statistics — Part I 


variable that possesses a distribution or population of associated y values for any given value of 
x. This distribution of associated y values, for any given x, is described either by a probability 
density function /,( yjx) or by a probability mass function p, ( ylx) if the population has a 
discrete set of possible values. This distribution represents the relative likelihood of different 
values of Y occurring. 

: The mean of each probability distribution of Y values varies in some constant and 
systematic manner with the independent variable x. The mean of any distribution of Y for 
given x will be denoted by My, = E(¥|x) and the variance of this distribution by 


o; 


yjx = Var( Y|x). These are unknown parameters. They are constant for any fixed value of x 


but may vary between the distribution of FY, for different x,. The mean of Y for all values will 
be dented by j, and the variance by o? or a”. 


It is not unreasonable to assume that the random variable Y depends on the associated x 
value only through its mean thatis yw, = f(x), but any higher moments of Y do not depend 


Y| x 
on x. Thus we shall assume that the mean value of the random variable Y depends upon the 
associated x value but the variance does not. We shall further assume that oes y 38 constant for 


all values of x, i. e., oF ~ = @?. Hence we assume that all the means Hy, lie ona continuous 


curve called the regression curve. The particular form of the regression curve is arbitrary, of 
course, and varies from one application to another. We shall only concentrate our attention, for 
the time being, on the simplest type of regression curve, namely, the straight line (a linear 
dependence) but shall give procedures for more general models. | 

14.5.2 Mathematical Formulation of Regression Model. Our observed paired values of x 
and Y are only sample values from a large population. However, for a moment we are concerned 
with constructing a model for the population of all possible paired values. If, for example, a linear 
relationship is considered to be appropriate, that is, the average relationship between the 
dependent random variable Y and the independently varying non-random variable x is assumed 
to be linear. Since- we are interested in the conditional expectation Hy). = E( Y| x). By assuming 
that Y and x are linearly related, we are saying that all possible conditional means Hy) x which 
might be calculated one for each possible value of x must lie on a single straight line. This line is 
called the population regression line. To specify this line we need to know its slope and intercept. 
Let o be the y-intercept and f be the slope of the line. The population regression line is written 
as follows: 

My, = a+ Bx 
This line is unknown. When some exact value of x is specified from its domain, it is customary 
to denote this value as x;. Associated with this value x, of the independent non-random variable 
x, there exists a random variable Y. with a distribution or population with mean Hy) x and 
t 


t 
variance Ove . Assume that 
’ i 


j Hy x. = a+ Bx,, coe — ECY, — My, y = o2 








Simple Linear Regression and Correlation at. | 187 


Now we define a deviation of the random variable Y; from its unknown mean Hy) x and call 
: i 


this deviation as population regression error. For this reason, this difference is usually called the 
random “error” and denoted by €; . Therefore, we can define the random variable €; as 


€ = -Mys = ¥,- (a + Bx) =,%- a —- Bx; 


There are three generally recognized sources of errors in such regression problems: 
(1) specification ( or equation ) error, arising from the omission of one or more relevant 
independent variables; (2) sampling error, arising from random variation of observations around 
their expected value and (3) measurement error, arising from the lack of precision in measuring 
variables. These errors are assumed to have zero mean and the constant variance identical to the 
variance of Y for a given value of x. We can now define what is called the population regression 
model as 
Y= at Bx te, 


I 


where, x, = a predetermined value of a non-random variable. 
¥, = associated with x, a random variable with mean Wy, = @ + Bx, and 
variance OY Ys, = 0". 
o = _ the population y-intercept of the regression line. 
6. = the population slope of the regression line also known as the population 


regression coefficient. 


€, = the deviation (¥, - My) x.) in the population. 

This model is said to be simple, linear in parameters and linear in independent variable. 
It is simple, in that there is only one independent variable, linear in the parameters because no 
parameter appears as an exponent or is multiplied or divided by another parameter, and /inear in 
the independent variable, because the variable appears only in the first power. 


14.5.3 The Sample Simple Linear Regression Model. In our population regression model a, 
B, Hy, and o* are unknown parameters we wish to estimate these parameters statistically on 
the basis of our sample observations on x and Y, and we may wish to test hypothesis and 
construct confidence intervals about these parameters. In this regard sampling is accomplished as 
follows: : 

(i) Asetof n values of x in its domain is observed and denoted by x,, x,,°°*, x,,- The 
x's are not random variables, but they may be selected either by some random Oeedure 
or by purposeful selection. 

(ii) Each x, determines a distribution or population whose mean is &© + fx; and whose 


variance is o*. From this distribution a value (a sample of size one ) is selected at 
random and denoted by Y,. 


188 | Statistics — Part I 


Thus we have a set of n pairs of observations denoted by (¥,, x,), (%, *5), -7*. CY» *,) 
which we have written to stress the fact that each sample of Y that we take has an ee ciaie x 
value. The values of x may or may not be all distinct, but as we shall see, we must have at least 
two different values of x represented if we are to estimate both a and f. We can write the n 
actually observed sample pairs'as (x,, y,),(%), Y2), °**,.(%,,. y,,) without incorporating any 
further assumptions into our model, we can obtain estimates of a, 6 and My It is customary 


a: 


to let a be the best estimate of a, b be the best estimate of B and to let y be the resulting 
estimate of ake and the line resulting a,b and y is called the best fitted regression line. This 


line has the same form as the population line. Thus the sample oe linear regression is 
y=atbx 


where y = _ the ordinate of the estimated line for any given value of x which is the best 
point estimate of Hy) 
a = _ the y-intercept of the estimated line which is the best point estimate of & 
b = the slope of the estimated line which is the best point estimate of B 


Thus, if x, is a specific value of x, then 
y; = a + bx, 


is the equation for finding y,, which is the best estimate of My). for this value x,. 


We can ‘specify a sample regression model 
just as we did in the population regression 
model. Again we need to define an error 
term, which in this case is the deviation of 
actual value y, from predicted value y.. This 

ror tcrm is denoted by e,, which means that 
the sample regression error e, 1s an estimate 
of the popugon error €.. The errors e, 
(i = 1,2, ---, 7) are often called residuals 
or deviations or prediction errors. 





Fig. 14.3 Residual of y 


Residual: e, = y,-—y, = y,—(a+bx,) = y,-—a-— bx, 
These residuals from the estimated line will be positive or negative as the actual value lies above 
or below the line. Thus the sample regression model is 
y, = a+ bx, +2, 
14.5.4 Covariance of Two Variables. If (x, ¥,), (4%, Yo), °*7, (%,, y,) are nm pairs of 
observations on two variables X and Y, then the covariance, denoted by Sry> is defined as 
Lx; = X)(y; ay) 


A} = oe i = l 2: - = @ fil 
xy n : + = 2 





Simple Linear Regression and Correlation , | 189 


It is a measure of the linear mutual variability of the two variables. Its sign reflects the direction 
of the mutual variability: if the variables tend to move in the same direction, the covariance is 
positive; if the variables tend to move in opposite directions, the covariance is negative. It can be 
easily expressed as 





DB 8 Fig ees 
Syy = iL Fy a OO One 
If x,, X., °°", xX, and y,, yy, °°", y, are two series of n observations each, and if 


Zz, = X; + Yi then 


@ si =r tls 


(ii). st = sp4ise when X, Y are independent variables 


14.5.5 Least Squares Point Estimation of a, f and Hy}. (Fitting of Straight Line), We 


now face the problem of estimating the linear regression between a dependent random variable Y 
and an independently varying non-random variable x given a sample of y values with their 
associated values of x. A general method of estimating the parameters of a regression line is the 
method of least squares which is explained in the following theorem. 


Theorem 14.1 Let (x,, y,), (X,, Y2), '**, (X,, y,) € nm observed values of a random 
variable Y, with their associated x values, where the regression line is Hy y= at Bx. 
(i) The least squares line is givenby y = a + bx, where 


n 2x, y; —(Zx,)(Ly;) Dx; y; —mxy 


eee (NED fe 
nx? — (2x;)? Lx? — nx? 
Dy Xia) CV jem) ese 
TPR)? s? 
ZY; —bdx; — men 
a= ay panne y — bx 


(ii) The least squares line always passes through the point of means (X, y). 
(iii) The least squares estimate of Hy) mals 
i 
y; = yas: b(x;-x) 


Example 14.1 The following sample of 8 grade point averages and marks in matriculation 
was observed for students from a college. 





Score | 480 490 510 510 530 550 610 640 | 
2s 2.9 oho Sila 2 Sema eI NS OS Dimas 


Find the least squares line. Estimate the mean GPA of students scoring 600 marks. 


ap A eI rie 


eo Ls: FEES STL os Sor 


190 t ___ Statistics — Part I 


2355800 13492 





The least squares estimates a and 6 are 
n>&x, y; —(Dx;)(Ly;) 8(13492) — (4320)(24.8) 


es ay SE IEE: 
nDx? — (Lx;)? 8(2355800) — (4320)? 
ae Ly, — bdx, _ U8 - O03 5570) 150751 
n 


The best fitted line is y 0.751 + 0.00435 x 
For x = 600,wehave y = 0.751 + 0.00435( 600) = 3.361 


14.5.6 Properties of the Least Squares Line. The line fitted by the method of least squares 
has a number of properties worth noting. 


(1) The sum of the residuals is zero, that is 
However, rounding errors may, of course, be present in any particular case. Hence, in 
minimizing > e? the least squares method automatically sets De, = 0. 
(2) _Thesum of the observed values y, equals the sum of the fitted values y, 
Ly; — Ly; 
Therefore, it follows that — 
(i) the mean of the fitted values y, is the same as the mean of the observed values y; 
5a 2 2K 
n n 
: (ii) 36,- y¥) is also equal to zero. 
(3) ‘The sum of the squares of the residuals 2 e? is minimum. 


(4) The regression line always passes through the point of means (x, J y ), the centre of 
. gravity of the observed data. That is, whenever x5 = x, wehave y; = 9). 


/ Spl 








Simple Linear Regression and Correlation _ 4 191 


Example 14.2 Given the following data 
ee | 
(a) Determine the least squares line taking x as independent variable. 
(b) Find the estimated values for given values of x and show that 

(i) Ly, = 2X; 

(ii) De, = 0 
(c) Calculate the sum of the squares of the residuals. 
(d) Verifythat Ye? = Dy? — addy, - bLx; y;. 


Solution. (a) The estimated regression lineis y = a + bx 











3.3 4.5 


The least squares estimates a and 6 are 


n&x; y, — (2x; )(Zy,) 5(47.1) — (10) (16.9) 





‘s nDx? - (Dx;)’ | 00) = (0) e oh 
ae ei —bdx, _ 16.9-1.3300) _ 945 
n 5 
The best fitted line is y = 0.72 + 1.33x 
(b) The estimated values ¥. for the given values of x and the residuals e, = y, — y, are 
obtained as shown in the following table. = 
x, y; j,= 072+ 133%, ¢ =y-3, y? 
0 10  0.72+1.33(0) = 0.72 0.28 0.0784 1,00 
l 1.8 0.72 + 1.33 (1.) = 2.05 — 0.25 0.0625 3.24 
2 3.3 0.72 + 1.33 (2) = 3.38 © — 0.08 0.0064 10.89 
3 4.5 0.72 + 1.33(3) = 4.71 — 0,21 0.0441 20.25 
4 63 0,72 +1.33(4) = 6.04 0.26 0.0676 39.69 


Sum 169  *16.90 FOu Nn 02590 unn7S07m 


It is verified that 
i) 169 = Ly, = Ly, = 169 
(ji) Le = X(y - 3) = 0 
(c) The sum of the squares of the residuals is | Le? = 0.259 





ope TTT 


== 7 7 —"s 
| } 


Trig 


——— 


— 


sss 


192 , | _ Statistics — Part Il 


(d) We are to verify that de? 


Ly} - ay — 62x; y; 
0.259 = 75.07 - 0.72( 16.9) — 1.33(47.1) 
0.259 = 0.259 


14.5.7 Coding and Scaling. In many cases the process of coding and scaling by a linear 
transformation can simplify the job of estimating the regression line or curve. 


I! 


Il 


Theorem 14.2 The sample linear regression coefficient b is independent of change of origin 
but it is not independent of change of scale. 


Let (x,, ¥,), (%2, ¥2), °°", (%,, ¥,) be n observed values of a random variable Y, with 
their SEEEEE 2 x values, then the sample regression coefficient 1s 


2 2s SEO SSP. 
<3 
l 2x; — x) 
Let Cl Saree => x =pthu => x =pthu 
and = => y= qt ky, => yYy=qtkrkv 


In this transformation we choose the constants p, gq and h, k so that the transformed values 
u; and vy; become as simple as possible. Then 


k 
by, = yy b,, 
A Special Coding and Scaling. If the values x,, x,,°*-, x, of the independent variable x are 
equally spaced at an interval h, then calculations involved in solving the normal equations can be 
made much simple by taking the origin at xX and choosing a suitable unit of measurement. The 
choice of origin and unit is explained below in the two cases. 
(i) If the sample size is odd, say n_= 2m — 1, then we take the origin at the middle value 
x,, which is equal to X, i.e. x = x,,.If h is the common interval, we take h asa 
new unit of measurement, then changing each x, into u, by a linear transformation 
= (x; moe the variable u takes the values —(m — 1), —(m —- 2), ° 


S —1, 0, 1, 2,°--, (m -— 2), (m — 1). Thus, we get 
Du, = 0 = Su = De =- 
(i) If the sample size is even, say n = 2m, then we take the origin at the average of the 


two middle values x,, and x Sch is coeionz' Len hem = (Xeeb Keg) (ole ft 


is the common interval, we take h/ 2 as a new unit of measurement, then changing each 

x; into u; by a linear transformation u, = (x, —x)/ oi 2), the variable u takes the 

values — (2m — 1),—(2m — 3), ---, —3, -1, 1, 3,°++, (2m - 3), (2m — 1). 
; Thus, we get 


Lu; = Q = Lu; = D378 a 





Simple Linear Regression and Correlation _ L 193 


The values of the controlled variable are coded into integers symmetrically about zero. When the 
values of the coded variable u sum to zero, the least squares line of Y upon u becomes 


y=artbu 


Xj 


. Lu, y; 
where a= = Vy and Pe et 
n Y Lu? 
In the end, we must change the least squares line of Y on uw into the least squares line of 
Y on x by transforming back the coded variable u into the original variable x. Sometimes, for 
the sake of further convenience each observed value y, of the dependent variable can also be 


transformed into v, by a linear transformation v, = (y; — g)/k where q and k have arbitrary 
values. | 





Example 14.3. The following table shows the tons of steel produced versus the number of 
workers in a small steel mill. 


Observation number Number of workers Tons of steel produced 
i 











l 
2 
3 
4 
5 
6 
7 
8 
eae ; : x-45 
Estimate the line of regression using u = 72 
: = | | Tie x, — 4.5 
Solution. Wehave x = 4.5 and A = 1. Lettu, = —— = — 
h/ 2 1/2 


] 
2 
3 
4 
3 
6 
7 
8 





FB 
————— eee 
i 





194 Statistics — Part Il 


The estimated regression line of Y on u is y = a+ bu 
The least squares estimates of a and 6 are 


>. Ji 96 





aq= U—= = 1? 
n 8 

bh = 2 Yi = 182 = ].0833 
Dur 168 


The best fitted line of Y on u is y = 12 + 1.0833 u 











Substituting _— for u, we get the best fitted line of Y on x as 
a — 4. 1.0833 , 
ie 2 + 1.0833 | + : )- 12 + a= (x ~ 45) 


1] 


12 + 2.167(x —4.5) = 12 + 2.167x — 9.75 
2.25 + 2.167 x 


Example 14.4 The following data show, in convenient units, the yield Y ofa chemical 
reaction run at various different temperature x. : 


n =. 7, Lx, = 980, Ly, = 274, DX; y; = 3958, 


HH 


Sx? = 140000, Ly? = 115.54 
Assuming that a linear regression model Y,; = & + Bx; + €; is appropriate estimate the 
regression line of yield on temperature. Find the residuals sum of squares. 
Solution. The estimated regression lineis y = a + bx 
The least squares estimates a and b are 


-It DX, yj = (Lx; MXy;) 


b = 
| NLXxX? - (2x,)? 
7(3958) — (980)(27.4) 
ee ),04 557, 
7(140000) — (980)? 22 
—b 
g = DMT hEH _ 27.4 ~ 0.04357(980) _ _ 4 1955 


hn 7 
The best fitted line is y = -2.1855 + 0.04357 x 


The sum of squares of the residuals is 
Le? = Ly? - adiy,— bd x; yi ae SES 
= Bsa =.= 2.1855 )(27.4) ~ 0.04357 (3958) = 2.97 


Simple Linear Regression and Correlation , 195 


14.5.8 Limitations of Linear Regression. There are a number of limitations and cautions that 
must be kept in mind when using linear regression. They are. 

Firstly, the linear regression is applicable only to relationships that can be described by a straight 
line. Non-linear regression methods exist to deal with some non-linear relationships. If you are in 
doubt about whether the data are approximately linear, a scatter diagram will help you to decide. 
Secondly, the procedure used to find the regression coefficients a and 6 will give us a linear 
equation which is the best fit ( i. e., has the lowest value of >. e? ) for the data, even when a linear 
relationship is non-existent. Therefore, the test of significance must be made to determine 
whether the regression coefficient 5 is “real”. 

Thirdly, the regression equation predicts values of the dependent variable based on values of the 
independent variable. It is therefore an asymmetrical measure. The regression equation predicting 
Y based on x (called “regression Y on x’) cannot be used to derive the equation that will 
predict x basedon Y. 

Finally, the regression equation holds only for the range of values actually observed. The 
regression equation will not necessarily hold beyond this range. 


. | Exercise 14.1 
1. (a) Whatis ascatter diagram? Describe its role in the theory of regression. 
(6) Explain what is meant by 


(i) regression, (ii) regressand, (iii) regressor 
(c) Explain what is meant by 
(i) simple linear regression, (ii) simple linear regression coefficient. 


2. (a) The following measurements of the specific heat of a certain chemical were made in 
order to investigate the variation in specific heat with temperature. 
Temperature (°C ) ni | 0 ‘i 10 20 30 | 40 | 
Specific heat y eOls! 055° .0.57 0.59 0.63 
Plot the points on a scatter diagram and verify that the relationship is approximately 
linear. Estimate the:regression line of specific heat on temperature, and hence estimate 


the value of the specific heat when the temperature is 25 C. 
(y = 0.514 + 0.0028x; y = 0.584) 


(5) Determine the estimated regression equation y = a + bx in each of the following 
cases 
() n=10, Sx, = 20, Dy, = 2605 x y= 3490 xe 3144 


(ii) n= 100, X 
(iii) .X¥ = 52, ¥ 


i 


125, y = 80, dx; y, = 1007425, 2 x? = 1585000 
237, D(x, — ¥ )° = 2800, D(x, —X)(y, = ¥) = 9871 
(iv) n=8, ¥ =7, y =5, Dx, y, = 364, D(x, -¥) = 132 


i: 


= 24.0864 + 0.9568 x; y = 38.75 + 0.33 x; y = 53.70 + 3.525x: 
= 0.5459 + 0.6363 x) - 





| 
| 
| 


ee ES a ES EEE ee _— eee 
ee 


aes = 


| 
| 





196 


Statistics — Part II 





3. (a) Estimate the regression line of Y on x for the following data. 


(6) 


4, (a) 


(0) 


| Production (Bushels/ Acre) y; | 70 70 80 100 


{@ ¥ = 59 + 0.070x; (ii) when x = 0, § = 59; (iii) when x = ¥ = 300 


x; 25 30 35 40 45 50 


Yj 78 70 «65 58 48 42 
Is it possible from the equation you have just found 
_ (i) an estimate for the value of x when y = 54? 


(ii) an estimate for the value of y, when x = 37? In each case, if the answer is 
“Yes”, calculate the estimate. If the answeris “No”, say why not. 
{ y = 1144 - 145x; (i) No, x iscontrolled; (ii) 61 } 


From n. pairs of values (x;, y;), i = 1, 2, °+*, the following quantities are 
calculated 

fj? cS ih Dee = 400, Dy y, = 220, 

Xx? = 8800, Dd y7 = 2620, Xx, y, = 4300 


Find the linear regression equation of y on x and x on y. Which would be the more 
useful if. 


(i) x is the age (in years) and y is the reaction time ( in milliseconds ) of 
20 people; 


(if) x is the cost (in ,000 Rs. ) and y the floor-space (in 100 ft? ) of 
20 buildings 


{y = 13.5 — 0,125x; x = 25.5 - 05y; (i) y on x; (ii) x on y} 
The following table shows the ages x and systolic blood pressures Y of 12 women. 


Age (years) x, | 56 42 72 36 63 47 55 49 38 42 68 =# 60 
















(47 125 140 152 155 


Blood pressure 


160 118 149 128 150 145 115 


Assuming that a linear regression model Y, = a + Bx; + €, is appropriate, estimate the 


linear regression of blood pressure on age. Estimate the expected blood pressure of a 
woman whose age is 45 years. What is the change in blood pressure for a unit change 


in age? 


(¥ = 80.78 + 1.138x; 132; 1.138) 


Suppose that four randomly chosen plots where treated with various levels of fertilizer, 
resulting in the following yields of corn. 


Fertilizer (kg/Acre) Ss; | _—«2100 200 «400 500 


(i) _ Estimate the linear regression My. = a + Bx of production Y on fertilizer x. 
(i) Estimate the yield when no fertilizer is applied. 

(tit) Estimate the yield when the average amount of fertilizer is applied. 

(iv) Estimate how much yield is increased for every kilogram of fertilizer applied. 


_¥ = 80; (iv) 0.070 bushels per kg of fertilizer } 





5. (a) 
(b) 


6. (a) 


(b) 


7. (a) 


(0) 


Simple Linear Regression and Correlation | _ ee! 


Describe the properties of the least squares regression line. 
Determine the regression line and estimate the weight of a student whose height is 
68 inches. ea + BE. 
Height (inches) _%; | 72 66 67 69 74 61 66 62 70 63 
Weight (pounds) _y;_ [178 141 158 165 180. 133) 159 140160 136 
Find also the estimated values for given values of height. Show. that the sum, of the 
estimated values is equal to the sum of the observed values of weight. Find the 
deviations e, = y, — ,. Show that these deviations add to zero. 
(y = —944 + 3.72x; 158.76) | 
Four identical money boxes contain different numbers of a particular type of coin and no 
coin of other types. The information on the combined weights, is givenbelow. 

Number of coins in box x; 10 205 ae 30 40 

‘Combined weight of coins and box Vial ese 509 682 ~ 865 
Estimate the regression line of Y on x. Estimate from your regression line, 

(i) the weight of an empty box, 

(ii) the mean weight of a single coin. State the co-ordinates of one point through 

which the line of regression of Y upon x must pass. 

{ y = 134 + 18.32x; (@ 134, (i) 18.32; (25, 592) } 
Fifteen boys took two examination papers in the same subject and their marks as 
percentages were as follows, where each boy’s marks are in the same column. 
Paper I }65 73 42 52 84 60 70 79 60 83 57 77 54 66 89 
Paper II y; 178 88 60 73 92 77 84 89 70 89 73 88 70 85 89 
Calculate the equation of the line of regression of Y on x. Two boys were each absent 
from one paper. One scored 63 on paper I, the other scored 81 on paper Il. In 
which case can you use your-regression line to estimate the mark that the boy should be 
allocated for the paper he did not take, and what is that mark ? 
(y = 35.53 + 0.665x, 63 on I = 78 on II) 
The following data shows the son’s height and father’s height. 
Father’s height (inches) _*; | 59 61 63 65 67 69 71 7375 
Son’s height (inches) __J 66... 67, 167) 468/69 a Ola 2ieT2 
Estimate the regression line Lyte = a + B x of son's height on father's height using 


u, = (x, -67)/2 and v; = y,; — 68. Predict the mean height of sons whose fathers 


are 70 inches in height. 
(y = 35.95 + 0.4833 x; 69.78) 


For 9 observations on supply X and price Y the following data was obtained 
E(x; — 90) = —25, . L(x, = 90)? = 301,.., Z(y, — 127.) =+12; 
L(y; -— 127) = 1006, Xx; — 90)(y, - 127) = - 469 
Obtain the estimated line of regression of X on Y and estimate the supply when the 





| 
| 
: 
| 
| 
| 


oe a ee 


198 


(c) 


8. (a) 


(5) 


9. (a) 


(6) 


Statistics — Part II 


price is Rs. 125. 
(x = 143.69 -0.44y; 88.69) 
Number of revolutions x (per minute) and power y (hp) of a diesel engine are 
X; 400 500 600 700 800 
y; 580 1030 1420 1880 2310 
Determine the regression line of the y-values on the x — values of the sample using 
x, = 100 u; + 600 and y, = 10 v, + 1400 estimate y when x = 750. 
(y = —1142 + 4.31 x; 2090.5 ) 
Fit a straight line taking x as independent variable 
3x; + 2 5 8 11 14 17 20 23 26 
3y,- 2 7 10 16 16 25 28 28 34 
Also estimate y for x = 5/3. 
(y = 1.714 + 1.286x; 3.86) 
Fit a least squares line to following data taking (i) Y as dependent variable (ii) X as 
dependent variable. 
x; ] 3 4 6 8 9 11 14 
y; l 2 4 4 5 7 8 9 


Show that the two least squares lines obtained intersect at the point (x, y). estimate 
the mean value of y when x = 7. Estimate the mean value of x when y = 6. 
(y = 0.5455 + 0.6364x, y = 5 for x = 7; 
x =-05+15y, x = 8.5 for y = 6) 
A random sample of 5 pairs of observations. (x;, y,) is given below 


x; 3 2 5 a 4 
a fice 13 9 27 8 18 
- Determine the least squares linear regression y = a,, + b,, x andestimate y for 


x = 6. Also find the least squares linear regression ¢ = a,, + b,, y and use this to 
find that value of y for which x = 6. Account for the difference. : 
(y = 09+ 47x, § = 29.1 for x = 6; x = 0.09 + 0.194», y» = 30.46 for 


X = 6 which is a useless estimate, because the regression analysis does not permit the 
inverse use of the least squares line. ) 


Compute the regression coefficients in each of the following cases: 
(i) n= 10, X(x,-x)’ = 170, Y(y,-y)* = 140, L(x, - ¥ My, - ¥) = 92 
(i) Xs, - ¥ Ny, - ¥) = 148, s, = 7.933, s, = 16.627, n = 15 — 
(b,, = 0.54, b, = 0.66; ‘by, = 0.16, b,, = 0.04 ) 


© 


Simple Linear Regression and Correlation 199 


146 SIMPLE LINEAR CORRELATION 


The simple linear correlation measures the strength or closeness of linear relationships 
between two variables. The purpose of simple linear correlation is to determine whether or not 
two variables are related, that is, whether one variable tends to increase ( or decrease ) as the 
other variable increases. The correlation analysis is performed keeping in view the following two 
aspects. 


(i) It measures the closeness of the linear regression to the distribution of observations of a 
dependent variable with associated values of an independent variable. 


(ii) It measures the degree (extent or strength) of covariability between two variables. 


We have discussed this first aspect in the preceding chapter. We shall now discuss the 
second aspect of correlation. This approach to the problem of understanding the relationship 
between two variables is to leave the type or form of the relationship unspecified and concentrate 
on measuring the strength of the relationship itself. 


14.6.1. Positive Correlation. The correlation is said to be positive (or direct) if the two 
random variables tend to move in the same direction, i. e., increase (or decrease) simultaneously. 
That is, the correlation is positive if the least squares regression lines have positive slopes. 


Perfect Positive Correlation. The correlation is said to be perfect positive if the relationship 
between the two random variables is perfectly linear with positive slope. 


14.6.2 Negative Correlation. The correlation is said to be negative (or inverse) if the two 
random variables -tend to move in opposite directions, /. ¢., one random variable decreases as the 
other random variable increases. That is, the correlation is negative if the least squares ISDE 
_ lines have negative slopes. 


Perfect Negative Correlation. The correlation is said to be perfect negative if the: relanorstip 
between the two random variables is perfectly linear with negative slope. 


14.6.3 No Correlation. If one least squares regression line is horizontal and the other least 
squares regression line is vertical then there is no correlation between the two random variables. 


That is, if X and Y are independent, then Cov( X, Y) = 0 which implies that p =.0 and we. 


say that there is no correlation. 


14.7 CORRELATION ANALYSIS 


One of the most widely used statistical techniques applied by statistician is correlation 
analysis, In purely correlation problems both the variables X and Y are random and the 
relationship between them is considered simultaneously and symmetrically. Examples of 
correlation problems are: (i) heights and weights of persons, (ii) ages of husbands and ages of 
wives at the time of their marriages, (iii) I. Q. of brothers and I. Q. of sisters, (iv) marks of 
students in economics and in statistics, (v) income and I. Q. of persons, (vi) demand and supply 
of a commodity, (vii) daily wages and overtime wages, (vili) gold prices and silver prices, 
(ix) the height and the circumference of head of babies at the time of their birth, (x) the greatest 
and the smallest diameters of hen eggs, efc. 


In correlation problems, we sample from a population, Soeeniie two measurements on 
each individual in the sample. For example, if a person is selected at random, and both his height 
and weight are left free to take any possible values. Thus we have a joint distribution of two 
random variables or we may say that we have bivariate distribution. The data are assumed to be 
obtained by taking a random sample of values of X and Y. 





| 
| 
eu 
) 


OPT OES © 
—_— ee 





200 _ ? | Statistics — Part I 


14.7.1 Sample Correlation Coefficient. If (x,, y,), (4, y2), °°", (4%. y,) is a random 
sample of n pairs of observations from a bivariate population, then the sample correlation 
coefficient, denoted by r or more appropriately r,,,. is defined as 


Syy 





It can be expressed as 
Sxy Sy Ate. : Dx: = X)(y; = y)/n 
{X( x; — x)? /n}{ L(y; - ¥)?/ 2} 











| x;y; —nxy | 
D(x, - ¥)? L\y; - ¥)? (Lx? — nx?2)(Ly? -ny?) 
Lx; yi 7 (Lx) (Ly; )/n 








| (Dx? - (Lx, )?/n}{ Dy? -(Ly,)?/n} 
nd x; Via Ox) C5;) 





{n dx? —(2x;)7 }(nZy? -—(Ly,)?} 
This r is the maximum likelihood estimate of p. The process of subtracting ¥ and y indicates 
that the origin has been shifted to (x, y). 
14,7,2 Properties of Sample Correlation Coefficient r. The sample correlation coefficient r 
has the following properties. 
(1) r is symmetrical with respect to the variables X and _, that is 


Tey = Nyx 


(2) r is the covanance of values of the two variables X and Y measured in standard units, 
that is 


r = Cov(z,, Z,) 


(3) Change of Origin and Scale. The value of r remains unchanged if constants are added 
to or subtracted from the values of the variables or if the values of the variables are 
multiplied or divided by constants having the same sign, but the value of r changes in 
sign only if the values of the variables are multiplied or divided by constants having 
opposite signs. That is, the magnitude of the sample correlation coefficient |r| is 
independent of change of origin and scale. 

(4) r always lies between —1 and +1, i.e., 


-Ilsrsl 
(5) r| is the geometric mean of the two regression coefficients b, , and b, y? that is 


r = (+/-), |b Xd, 





Simple Linear Regression and Correlation _ 201 


+ 1b xb if b,. and b,,, are positive 
Thus r= et cm. be eax 
— | by, x b,, if b,, and b, , are negative 


(6) r is zero when one of the variables X or Y is constant. © 

Theorem 14.3. The correlation coefficient is independent of the origin and the scale of 
measurement of the variables. 

Let (x,, y;),(%2, Y2), °°", (4+ ¥,) be a random sample of n pairs of observations from a 


: 
| 
| 


bivariate population. 
hk 
If we let u, = (x; — p)/h and v, = (y; —q)/k then r,, = TrIIe] Te 
. i if h and k have same signs | 

phe, xan =f, if h and k have different signs | 

= | hk 
Similarly, if we let u, = p + hx, andv,;=q+ky;,then rr, = Tal le] Pe 
nes, in if h and k have same signs 
That is, Ct ry if-h and k have different signs 


Height (inches) 





Weight (pound) yy; | 125 137 156 112 107 136 123 104 


(i) Calculate the correlation coefficient between the height and weight of eight men by 
. using the deviations from their means. 
(ii) Again compute the correlation coefficient by taking the deviations of variable X from 


70 and of variable Y from 120. 
(iii) Do the results in (i) and (ii) agree? 
Solution. ({) The coefficient of correlation between X and ¥ Is 








Xi %Y me yy -¥ Cy -FP HAY EDO, -Y) a 
78 125 a3 0 9 0 0 | 
89 137 14 12 196 _ 144 168 i | 
97 156 22 31 484 961 682 | 
69 112 —6 —13 36 169 78 | 
59 107 — 16 —18 256 324 1 tn 288 i 
79 136 4 11 16° sell . 44 | 
68 123 =] <%) 49 4 | 14 
61 104 — 14 —21 196 44) 294 
600 _ 1000 _O0 0 1242 2164 __ 1568 

xX = 2x; = 600 — 75 

n 8 ; 
5 = LY; = 1000. 95 


= = 


202 


r 


xy 


ISG, = 3)? LO, - 9)? 


(x; =m xX)(y; — y) 


1568 


(ii) Let the assumed mean for x be p = 70, andu; = 











y (1242)(2164) 


x, - p = x; — 70. Let the assumed 


Statistics = Part II 


= 0.956 





mean for y be g = 120,and v,; = y,;- 4 = y,; - 120. 
Xx; | yi “= x; - 70 vi = y; = 120 ; u? v2 u; V; 
78 125 8 5 64 25 40 
89 137 19 17 361 289 323 
97 156 27 36 729 1296 972 
69 112 —] —§ ] 64 8 
59 107 — |] —13 121 169 143 
79 136 9 16 81 256 144 
68 123 —2 3 4 9 -—6 
61 104 -—9 — 16 81 256 144 
40 40 1442 2364 1768 
The coefficient of correlation between U and -V is 
E - nduy, — (24; )(2V;) | 
é (n Du? — (Lu, )? }{n Lv? - (Lv, )?} 
o S (1768) = (40)(40) = 0.956 
{8(1442) — (40)? }{8(2364) — (40)? } 
(ui) The results in (i) and (ii) are same, since 0.956 = r, xy = = 0.956. 


Example 14.6 The following data were obtained for a sample of a persons from a height 


and weight distribution. 


* Dx;= 700, Dy; = 1550, Xx? = 49120, Ly? = 


Compute the coefficient of correlation. 


Solution. The coefficient of correlation between X and Y is 


r 


n Dx; y; — — (X32; \Zy;) 
J {n 2x? ~(Lx,)?}{n Ly? =D y; 2) 


10(108650) — (700) (1550) 


240550, > x; y,; = 





! £10(49120) — (700)2 }{10( 240550) — (1550) } 


= 0.79 


108650 


14.7.3 Goodness of Fit of a Linear Regression Equation. One approach to correlation 


analysis emphasized the covariability of the two random variables. The other approach to 
correlation analysis is related to regression analysis and provides a measure of the strength of 
closeness of the linear relationship between two variables; thus correlation coefficient is a 
measure of the goodness of fit of the linear regression equation. Consider a sample from a 
bivariate distribution of X and Y. There are two regression functions, each obtained by 
considering that variable as dependent whose mean value is to be estimated and treating the other 


variable as independent. The two linear regression functions of YonX and of X on Y are 


Hy); 


= Oy, + Byy X 


ely = %xy + BxyY 





Simple Linear Regression and Correlation | 203 


where By, and By, are the population regression coefficients of Y on X and of X on Y 
respectively. Their corresponding least squares sample regression lines are 

j =a, +b,,x : £ =ayt+byy 
where b,, and b,, are the sample regression coefficients of Y on X and of X on Y 


respectively. The least squares estimates are 





s 
Lene Fi rete a 
250° aes dyy = Y— by, % 
I 
S 
aoe al ION = 
b., = 2? a, =x-5,,y 
= | 


The regression equation of Y on X becomes 
y= V+ bx —%) 
- and the regression equation of X on Y becomes 
x =X+b,,(y—-Y) 
The regression coefficients are related to the correlation coefficient as 


BY rs s. rs 














Thus the regression equation of Y on X becomes 


— ¥ 


(x -X) 





Sy 


and the regression equation of X on Y becomes 


+ (y -¥) 





Xj=s0-+ 
Sy 

yr Py, 

that if any one of these four quantities is zero then all others must equal to zero. A positive sign of 


r.., indicates that X and Y are directly related. A direct relationship between X and Y is 


xy 


Since s, and s, are positive, the sign of s, b,, and r,, will always be same. Note also 


associated with an upward sloping regression line; that is as one variable increases other variable 
also increases. A negative sign of r,, indicates that X and Y are inversely related. An inverse 
relationship between X and Y is associated with a downward sloping regression line, that is as 
one variable increases, the other variable decreases. 

Theorem 14.4 In the correlation analysis the two regression lines intersect at the point 
(x,y). 

Theorem 14.5 The correlation coefficient r is the slope of the regression lines for standard 
Scores. : 
Theorem 14.6 The graphs of the regression lines of Y on X and X on Y are identical if all | 
the points of the given sample lie on a straight line. 


204 | _ Statistics — Part 0 





Example 14.7 The following data were obtained for a sample of 10 men from a height and 
weight distribution. 


x = 70, y = 155, D(x,-x¥) = 120, 
> y? = 240550, E(x, -—X)(y; - ¥) = 150 
Calculate covariance, correlation coefficient, the two regression lines. 
Solution. The variance of X, variance of Y, covariance and correlation coefficient are 


eS et) e120 





so = Ene. = 70 = - 
s2 = B53 = — (155) = 30 
= Sel GmOOray), 150 —,. 
n 10 
oy ea 
$2.55 J (12)(30), 
_ The estimated regression line of Y on X is yra,+ bx 
The least squares estimates of a, and 5, , are 
b,, = a = -- = 1:25 
a,, = y—b,, x = 155 - 1.25(70) = 67.5 
The best fitted line of Y on X is y = 67.5 + 1.25x 
The estimated paealine of X on ¥ is xk = a,,+b,,y 


The least squares estimates of a , and b,, are 


Syy 15 
=~ =— = 9, 
= S$ 30 aM 


Bes) Sed Peni pS Wiha on oa = -7.5 
The best fitted line of X on Y is = —7.5 + O5y 


Example 14.8 Find the coefficient of a a if the two regression coefficients have the 
following values 


(i) 0.45 and 0.8, (i) -0.1 and -0.4. 
® 1 = (4+/-) /b,,xb,, = + {(045)(08) 
@ r= (4l-) Jo,, xb, =, -— J (-0.1)(-0.4) = -02 





Simple Linear Regression and Correlation 205 


Example 14.9 The coefficient of correlation, for a sample of 20 pairs of observations is 0.6. 


If x = 12, y = 20, s, = 15 and s, = 2, estimate the lines of the regression. Estimate 
the mean of Y for x = 10. Estimate the mean of X for y = 22. 
Solution. The estimated regression line of Y on X is Y= ay, +b, x 


The least squares estimates of a,, and b,, are 


rises 
b,, + i 0.6(2) - 08 
Ss 1.5 \ 


x 


a,, = ¥-b, ¥ = 20-08(12) = 104 








The best fitted line of Y on X is 
y = 104 + 08x 


For x = 10, we have 
y = 104 + 0.8(10) = 18.4 


The estimated regression line of X on Y is ea ta 40TY 
The least squares estimates of a,, and b,, are 


b _ rs, _ 0605) _ os 


=2 $3 2 
a, = ¥-b, 7 = 12 - 0.45(20) = 3 


xy 





The best fitted line of X on Y is 
x= 3+045y 

For y = 22, we have 
x = 3 + 0.45(22) = 12.9 


Example 14.10 The following results are given from paired data of two variables 
X and Y. | 


Estimate of variance of X = 9 


Estimated regression line of X on Y: 40x - 18y = 214 
Estimated regression line of Y on X: - 8x- 10y = -66 
Find (i The coefficient of correlation between X and Y, {i Standard . 
deviation of Y, 
(ti) Mean values of X and Y. | 
Solution. (i) The estimated regression lineof X on Y is 
“ : 214 18 Aer a 18 
40x - 18 y = 214 = ASS San Sd Oe = AP erry) a 


0.45 
The estimated Sal line of Y on Xis 


206 vo Statistics — Part II 








~ 66 8 8 
8x- 107 = -—66 => y= —+—x = Dee = 8 
yee aa a9 0 Ze 10 
The estimate of the correlation coefficient between X on Y is 
= (+/-),/b,, Xb,, = + J (0.8)(0.45) = 
(ii) s=9 => s =+/9 = 
S 
a ay ¥ 
6, = = 
0.6 s, 
0.8 = a = 0.65, = 0.8(3) = s, = 4 
(iii) Since both the estimated regression lines pass through the point (x, y ). Thus 
AO XK — LEY SH QA. ivvccccccscccssecceecesssesavecusere(') 
ee TNS ear een eae ener (ii) 


Multiplying (ii) by 5 and subtracting it from (i) 
40x-18y = 214 
40 x — 50 y = —330 
- + + 
32 y = 544 = y = 17 
Putting this value of y in (ii), we have 
8x —10(17) = -66 => x= 15 
14.7.4 Correlation and Causation. It is necessary to consider the sampling distribution of the 
sample statistic R to decide whether or not we should accept the hypothesis that the variables in 
the population are related. But aside from this technical aspect of a relation between two 
variables, it is necessary for a statistician to consider whether or not correlation indicates a cause 
and effect relationship. It is possible to correlate the temperature of Lahore city with the birth rate 
and it is possible that a high positive correlation may be found showing that when the temperature 
is high, the birth rate is high, and when the temperature is low the birth rate is low. . 
There i is no meaning to such a correlation. There is no causal relationship between the 

two phenomena. This example illustrate that you can correlate anything, and there are chances 
you may obtain a high correlation which may have ‘no significant meaning at all. A high 
correlation simply tells us that the data we have collected is consistent with the hypothesis we set 
up. That is, it supports our hypothesis. We may say the following situations that brought about a 
high correlation. 

@) X is the cause of Y. 

(#) Y is the cause of X.. 

(tii) There is a third factor Z that affects X and Y such that they show a close relation. 

(ty) The correlation between X and Y may be due to chance. 





Simple Linear Regression and Correlation | 3 207 


Only by more thorough investigation we can come to some conclusion.as to whether or 
not X is the cause of Y. 





Exercise 14.2 


1. (a) Differentiate between regression and correlation problems, giving examples. 
(b) Define the terms correlation and product moment co-efficient of correlation. 


(c) For a set of 50 pairs of observations on variables X and /Y, we have 
Xx, — ¥ )(y,; - ¥) = 450. Find the covariance. 


(s,, = 9) 
2. (a) The simple correlation coefficient r = s,, /(s, 5, ) is given as 
se 2%; — X) CY; = y) 
| mY 2( x; — x)? X Lio ye 


The following table gives the ages of husbands and the ages of wives at the time of their 


marmmiage. 

Couple — i |) (23 aa Se a es OO 
Husband's age) 2).)|525 29 SOMO Ne SINE a2 NESS NIETO IGS 
Wife’sage 9), ||| 205-22) 24 eezomee 23 Ime DOMmaINcO mI 


Calculate the coefficient of | correlation by using the above formula. 
(r = 0.82) 
(6) The simple correlation coefficient r = s,,/ (5, 5,) is given as 
Lx, yj —NXy 
(La? —nx?}{Ly? —ny*} 
The following table gives the demand and supply of a commodity. 
Supply x; | 400 200 700 100 500 300 600 


Demand y;| 50 60 20 70 40 30 10 





Calculate the coefficient of correlation by using the above formula. 
(r = —0.857 ) 


(c) The simple correlation coefficient r = s, ‘ / (s, Sy) is given'as 
F 2x; ¥; = (Xx; (LY; )jn 

[{ Lx? —(Lx,)?/nM Ly? - (Ly;)?/n) 

The following table gives the traffic density and accident rate. 





Traffic density %,| 30 35 -40 45 50. 6 70 8 90 
Accident rate  y; | 2 4 5 5 8 15 24 -30 32 


208 “ae ‘Statistics — Part II 


Calculate the coefficient of correlation by using the above formula. 
(r = 0.983 ) 


(dq) The simple correlation coefficient r = s,,, / (s,5,) is given as 
(n Dx? —(2x,)? }{n Ly? - (Ly, )? 


The following table gives the number of persons employed and cloth manufactured in a 
textile mill. 


Persons employed x; 137 209 113 189 176 200 219 





Cloth manufactured J; 23 47 22 40 39 51 49 
Calculate the coefficient of correlation by using the above formula. 
(r = 0.963 ) ; 
3. (a) Forasetof 22 pairs of observations, we have 
Dx, = 983, Dy, = 409, Lx? = 61339, Ly? = 8475, Lx, y, = 15811 
Find the product moment correlation coefficient for the data. 
(r = —0.6325 ) 
(6) Forasample of 20 pairs of observations, we have 
¥=2 y=8 Dx2=180, Ly? = 3424, Lx, y, = 604 
Calculate the coefficient of correlation. 
(r = 0.6133 ) 
(c) Forasetof 8 pairs of observations, we have 
Lx; = 448, Ly, = 472, Ly? = 29958, dx, y, = 26762, s, = 16.6 
Compute the product moment correlation coefficient. 
(r = 0.15) 
4. (a) Foraset of 50 pairs of observations, the standard deviations of x and y are 4.5 and 
3.5 respectively. If the sum of products of deviations of x and y values from their 


respective means be 420, find the Karl Pearson's coefficient of correlation. 
(r = 0.53) 


(6) Foragiven set of data,wehave ss? = 9.102, s = 2.204, 5, = 1.694 


Find the product moment correlation coefficient for the data. 


(r = 0,378 ) , 
5. (a) Foragivensetofdata,wehave rr = 0.48, Sy = 36, s* = 16. Find Sy. 
(s, = 18.75) 


(6) Fora given set of data, we have 
r=05, 2(x,-¥Xy-y) = 120, s,=8 L(x,-x) = 90 


Find the number of pairs of values. 
. (n = 10) | 
6, (@) A computer while calculating the correlation coefficient between two variables X and 
Y from 25 pairs of observations obtained the following results. 
2x, = 125, 2x? = 650, Dy, = 100, “Ly? = 460, Lx; y, = 508 





ee 


Simple Linear Regression and Correlation 


(6) 


(c) 


7. (@) 


(5) 


8. (a) 


It was, however, later discovered at the time of checking that he had 


copied down two pairs as: while the correct values were: 





x; 6 8 
yi 14 6 
Obtain the correct value of coefficient of correlation. 
(r= 0.67) | | 
The following data show the marks in economics and marks in statistics obtained by ten 
students. j 
Student i'|' 1 SAD ess eas Si Oe ee ee O08 
Economics x; | 78 36 96 25 75 82 90 62 65 39 
Statistics y, | 84 SI 91 60 68 62 86 58 53 47 
({) Compute the coefficient of correlation. 
(i) Again compute the coefficient of correlation by taking the deviations of 
variable X from 50 and of variable Y from 60. 
(at) Dotheresultsin (i) and (i) agree? 
{ (i) 0.775, (if) 0.775, (iii) Yes } 
Compute the correlation coefficient between the variables X and Y represented in the 
following table: . 
x, 2 4 5 6 g ll 
y, 18 12 10 8 7 aS 
Multiply each x, value by 2 and add 6. Multiply each y, value by 3 and subtract 15. 
Find the correlation co-efficient between the two new sets of values, explaining why you 
do or do not obtain the same result as above. 
(- 0.92 ) 
Interpret the meaning when 
r= -l, i AVY r=] 
Sketch scatter diagrams which illustrate: 
(i) positive linear correlation, (if) perfect positive linear correlation, 
(iii) negative linear correlation, (iv) perfect negative linear correlation, 
(v) nocorrelation, between two variables X and Y. | 
From the data given below, calculate the coefficient of correlation between the ages of 
husbands and ages of wives at the time of their marriage. | 
Couple i l 2 3 E Sass 6 7 Sie 9i5 alO 
Husband’sage x; | 28 27 28 23 29 30 36 35 33 «233i 
Wife’s age Yi | 27 220) 122k 1S 2 29D ee oO ee 


Find the regression coefficients. Verify that r is the geometric meanvof the two 
regression coefficients 
(r= 0:82: b.. = 0.89, b., ="O: 70>) 





210 Statistics — Part Il 


(6) The two regression coefficients have following values, find _r. 
(it) b,, = 0.86, b,, = 0.95 
(i) 6, = —052, b,, = —1.02 
{() r = 0.90, (ii) r = —0.73 } 
(c) Find the two regression coefficients in each of the following cases. 
(aD a= 17.0, Dsy; = 32:8, ys, 
Lx? = 49.64, Ly? = 182, n=’ 8 
(ii) “n= 10" Sx, = x)* = 170, X(y; = ¥)* = 140, 
2 (x; — ¥) (9, - ¥) = 92 
(iii) 2x, - X¥)y,- ¥) = 148, ss. = 7.933, S, = 16.627, n = 15 


(iv) n = 8, X= y = 5, 2;%; ¥; = 364, 
(xc tae Swe, X(y,- Fy) = 56. 
()) ere 0.97; S20 = 7 11,00- See= 14-36 


x y 
{@ 6, =,2.06, b,, = 0.47; (ii) b,, = 0.54, b,, = 0.66; (iii) b,, = 0.16, 
b., = 0.04; (iv) b,, = 0.64, b., = 1.5; (v) b,, = 0.81, b,, = 1.16} 
9. (a) Explain why the regression line of Y on X is not necessarily the same as the regression 


line of X on Y. How would you decide which is the appropriate regression in any 
particular situation. Answer the following.? 
(ij) | When do the two lines coincide? . 
(7) When are they at right angles? 
{ (i) Exact linear relation. (ii) Uncorrelated X, Y (i.¢.. p = 0) } 
(6) Calculate the coefficient of correlation and obtain the lines of regression from the 
following data 


Price x; 3 4 5 6 7 8 9 lO Zl 12 
Demand y; 25 2A 20 ee Lee Lieve 6enl3 fold 6 
(r = -0.98, y = 31.45 -1.93x%, X¥ = 16-0.5y) 
(ce) Given the following data: 
n = 100, - Dx, = 5000, Ly, = 6000, 
x x; y; = 300300, Xx? = 250400, yy? = 360900, 
Calculate 
@) Ss, S, and r, 


(ii) regression lines, 
(iii) estimate the value of y for x = 55. 
{@ s, = 2, s,=3, r=05, (ii) j = 225 + 0.75x, & = 30.2 + 0.33), 


(iti) 63.75 } 


- 
,s eae ; 


ee Na ee 


ete ee, ee 


——— er a ew 


SO eet a he lt eee He 





Simple Linear Regression and Correlation 211 


10. (a) Given the following data: 


(5) 


(c) 


11. (a) 


~ (b) 


12. (a) 


(6) 


= 10, Dx, = 120, Ly, = 250, Lx, y, = 3070.7; s, = 35, s, = 7:2 


A 
Calculate regression lines. 
(> = 18.044 0.582; £ = 8.50 + 0.14y) 


Given that a and variances of two series X and Y are 


— a“ 
_—_—— 
Variance: 


The correlation coefficient between X and Y is 0.75. Estimate the most plausible 
value of Y for x = 40 and most plausible value of X for y = 58. 
(y -= 15.5 + 0.9.x, 51-5; * = 1.25 + 0.625 y, 37.5) 










If the mean height of 500 fathers is 68.65 inches with standard deviation of 2.8 inches 
and the mean height of their youngest sons is 69.65 inches with standard deviation of 
2.85 inches and the coefficient of correlation between them is 0.52 obtain the two 
equations of the lines of regression in the simplest form. 

(y = 33.27 + 053x; x = 33.13 + OSL y) 


Given the following data: 
x= 54, yu 28, 5b, =-I5, OR i= —Q.2 


xy 


Show that the two estimated lines of regression intersect at the point (x, y ). Estimate 
the value of X when Y = 30 and the value of Y when X = 55: 

Hint: Show that the estimated value of X for Y = ¥ = 28 is 54 and the estimated 
value of Y for X = X = 54 is 28. 

(y = 26.5; x = 53.6) 


For a given set of data, the least squares regression lines are 


Estimated regression line of Y on X: y = 20.8 - 0.219x 
Estimated regression line of X on Y: x = 162 -0.785y 
Find the product moment correlation coefficient for the data. 
(r = -—0415) 


For the following set of data, use u, = x, — 1000, and v; = (y; — 250)/5 to find the 


product moment correlation coefficient and the least squares lines of regression of Y on 
X and of X on Y. 











1000 «1012 1009 1007 1010 lois ~=©1010~—- 1011 









(0.583; § = —1386.1346 + 1.6235 x; x = 956.326 + 0.2096 y) 


On each of 30 items, two measurements are made on the variables X and Y. The 
following summations are given 


Dx,=15, Ly,=-6 Lae = 61, Lyp = 90, Lx, y, = 56 


212 


(c) 


Statistics — Part Il 


Calculate the product moment correlation coefficient and obtain the lines of regression 
of Y on X and X on Y. If the variable X is replaced by U where u, = (x, — 1)/2, 
find the correlation coefficient between U and Y and the regression lines of Y on U 
and U on YF. 

(0.856; » = —0.751 + 1.10 x; ¥ = 0.633 + 0.664 y; 0.856; y = 0.351 + 2.21 u; 

u = —0.184 + 0.332 y) 

The following table shows the marks in statistics and mathematics obtained by 10 
students from a large group of students. 


Marks in Statistics Plssoeevenos. 165: 387) 2715198: 68: 384 Ti 


Marks in Mathematics  y, | 82 78 86 72 91 80 95 72 89 74 
Estimate the linear regression function considering ya | 

(i) X as independent variable, 

(ii) Y as independent variable. 





(Cy = 29.13 + 0.661x; x = -—14.39 + LIS5y) 
13.(a) Arandom sample of 20 pairs of observations (x,, y,) gave the following 
xX =2, y =8 Dx? = 180, Ly? = 1424, dx, y, = 404 | 
Estimate the linear regression function taking (i) X as independent variable, 
(ii) Y as independent variable. 
{ y = 632 + 0.84x; x = -—2.67 + 0583 y} 
(6) Given the following data: 
=o = 15) Dy = 25, dix,-x)) = 10,. Sy, -y)? = 26, 
Dx; — X )(y; — y) = 13. Determine the two regression lines. 
(y = L1+1.3x; ¥ = 05+05y) 

(c) The correlation coefficient between the two variables X and Y is r = 0.60. 
Ifs, = 1.50, 5, = 2.00, ¥ = 10 and y = 20, find the equations of the two 
regression lines of Y on Xand Xon Y. | 
(y = 124+ 08x; x =1+045y) 

Exercise 14.2 
| Objective Questions _ 
1. _—“Fill in the blanks. 
(j) The ———— is a relationship that describes the dependence 
of the expected value of the dependent random variable for a 
given value of the independent non-random variable. (regression) 


(ii) The variable, that forms the basis of estimation, is called 
wy ——_. (regressor) 


Simple Linear Regression and Correlation 


(iti) 


(iv) The ————— diagram is a set of points in a rectangular co- 

ordinate system, where each point represents an observed pair 
7 of values. 

(v) The principle of least squares is used for finding the 
a and b of the parameters o and B. 

(vi) | The —————- regression line always passes through (x, V ). 

2. Mark the statements as true or false. 

(i) The simple linear regression model contains two parameters o 
and f. 

(ii) The simple linear regression model contains four parameters 
a, B, Hy} x and o°. 

(iii) The simple linear regression model is simple in that there is 
only one independent variable. 

(iv) The parameter a is called the slope and the parameter B is 
the intercept of the regression line. 

(v) The regression coefficient is denoted by @. 

(vit) The parameter a is called the y-intercept of the regression 
line. | 

(vii) In a regression analysis the independent variable is always 
prefixed while the dependent variable is random. 

(vill) The principle of least squares says that the sum of squares of 
the residuals of observed values from their corresponding 
estimated values should be the least possible. 

(ix) The principle of least squares is used for finding the estimates 
a and b of the parameters a and B. 

(x) The constant b estimates the parameter B representing the 
slope of the regression line. 

(xi) The regression coefficient b is independent of change in 
origin and scale. | 

(xii) The estimated regression equation of Y on x is used to 
estimate the mean value of Y fora given value of x. 

3. Fill in the blanks. | 

(i) The correlation analysis is possible when both the variables X 
and Y are 

(ii) If the two variables move in the —————. direction. the © 
correlation is positive. | 

(iii) If the two variables move in ————— directions, the 


The variable, whose resulting value depends upon the selected 
value of the independent variable, is called 7 


correlation is negative. 


213 


( regressand) 


(scatter) 


(estimates) 


(estimated) 


(false) 


(true) 
(true) 3 


(false) 
(false) 


(true) 


(artis) 


(true) 
(true) 
(true) 
(false) 


(true) 


(random) 
(same) 


(opposite) 


214 


4. 


5. 


(iv) 


(vy) 


(vi) 


. The correlation coefficient r is the 


Statistics — Part Il 


The -corrélation coefficient is 
origin and unit of measurement. 


of the change in 


mean of the 
two regression coefficients. 


r = Q indicates that the two variables are linearly ————— 


Mark off the following statements true or false. 


(7) 


(ii) 


(iii) 


(iv) 
(v) 
(vt) 


. (vil) 


(xiii) 
(xiv) 


(xv) 


The strength of covariability between two random variables 1s 
called correlation. 

The sample correlation coefficient R is a point estimator of the 
population correlation coefficient p. 


The correlation coefficient r is not symmetrical with respect to 
X and Y. 


The correlation coefficient changes with a change in origin. 

The correlation coefficient is not affected by change in origin. 
The correlation coefficient is not independent of the origin and 
the unit of measurement. 

The correlation coefficient is a pure number which is unitless. 
The correlation coefficient r always lics between —1 and 1. 

r = I indicates perfect positive correlation between the two 
variables and slope is positive. 


r = —| indicates perfect negative correlation between the two 
variables and slope is negative. 


Mark off the following statements true or false. 


(i) 
(ii) 
(iii) 
(iv) 
(v) 
(vi) 
eh 
(viii) 


(ex) 


r = Q indicates that one regression line is horizontal and the 
other regression line is vertical. 

The correlation coefficient r is the geometric mean of the two 
regression coefficients. 

Each of the two estimated regression lines passes through the 
point(x, y ). 

We can always estimate the’ X and Y values from the 
regression equation of Y on X. 

The regression coefficient of X on Y is —1.2 andof Y on 
X is 0.3.- 

The regression coefficient of X on Y is —1.2 andof Y on 
X is —0.3. 

The regression coefficient of X on Y, regression coefficient 
of Y on X and correlation coefficient have same sign. 

If the regression coefficient of X on Y is —1.2 andof Y on 
X is — 0,3, the correlation coefficient is 0.6. 

The regression coefficient of X on Y is always equal to the 
regression coefficient of Y on X. 


(independent) 


(geometric) 


(independent) 


(frue) 


(true) 
(false) 
(false) 


(true) 


(false) 
(true) 
(true) 


(true) 


(true) 


(true) 
(true) 


(true) 


(false) 


(false) 
(true) 


(true) 


(false) 


(false) 











ASSOCIATION 


Many experiments, particularly in social sciences, result in observations that are only 
classified into categories so that the data can consist of frequency count for the categories. For 
example, the classification of people into income groups as very rich, moderate, or poor; 
manufactured items may-be classified as being excellent, good, poor, or scrap condition; in a 
survey of job compatibility employed persons may be classified as being satisfied, neutral, or 
dissatisfied with their jobs; in plant breeding, the offsprings of a cross fertilization may be 
grouped into several genotypes; rainfall may be classified heavy, moderate, or light; each 
household may be classified as owning no cars, one car, or two or more cars. Our aim here is to 
present some inferential procedures that can be used to study data that are classified into multiple 
categories. 


15.1 MULTINOMIAL POPULATIONS 


When each element of a population is assigned to one and only one of more than two 
attribute categories, the population is called a multinomial population. 


15.2 ATTRIBUTE (QUALITATIVE VARIABLE) 


A characteristic which varies only in quality from one individual to another, is called an 
attribute. Examples of attributes are: marital status, education level, blindness, smoking, richness, 
beauty etc. It is not possible to measure an attribute quantitatively. The quantitative data relating 
to an attribute may be obtained simply by noting its presence or absence in the objects, and then 
counting that how many do or do not possess that attribute. 


15.2.1 Class and Class Frequency. A class is a set of the objects which are sharing a given 
characteristic. A class frequency is the number of observations ( or objects ) which are distributed 
in a class. es 


15.2.2 Classification of Objects. The objects (or individuals) can be divided into two distinct, 
mutually exclusive and complementary classes according to whether the objects do or do not 
possess a particular attribute. This process of dividing the objects into two mutually exclusive 
classes is called dichotomy. 


If several attributes are noted, the process of classification may be continued indefinitely. 
The objects that are classified according to as they do or do not possess the first attribute can 
further be subdivided according to as they do or do not possess the second attribute and the 
objects of each of these subclasses can still further be subdivided according to as they do or do 
not possess the third attribute, and so on, every class being divided into two subclasses at each 
step. For example, the members of the population of district Lahore may be classified according 
to sex as males or females; the members of each sex may be further subdivided according to 
marital status as married or unmarried; that results into the married males, unmarried males, 
married females or unmarried females; the members of these four classes may be still further 
subdivided according to educational status as literate or illiterate. 


215 


216 Statistics — Part Il 


15.2.3 Notations and Terminology. For theoretical study it is necessary to have some 
notations to represent different classes and their class frequencies. The capital Latin letters A, B, 

- are used to denote the attributes and their presence. The Greek letters a, B, --- are used 
to denote the absence of these attributes. Thus A_ will denote that the object possesses the 
attribute A and « will denote that the object does not possess the attribute A; B will denote 
that the object possesses the attribute B, and B will denote that the object does not possess the 
attribute B. Hence “o.” means “not A”.“B™ means “not B”. 


Class frequencies will be denoted by enclosing the class by symbols in brackets. Thus 
(A ) denotes the number of objects possessing the attribute A: (a B) denotes the number of 
objects possessing the altribute B but not the attribute A. 

The attributes denoted by A, B, --- are called positive attributes and their contraries 
denoted by a. f, ~~~ © are called negative attributes. Thus the classes A, B and AB 


represented by positive attributes are called positive classes; the classes a, B and af 
represented by negative attributes are called negative classes: and the classes AP, GB, etc. 
represented by both positive as well as negative attributes are called contrary classes. 


15.2.4 Order of Classes. Order of class is known by the number of attributes specifying the 
class, e. g., a class specified by one attribute is known as the class of order 1, the classes 
specified by two attributes are called as the classes of order 2: and the classes specified by three 
attributes are known as the classes of order 3. The total number of observations denoted by 7 is 
called the frequency of the class of order zero since no attributes are specified. 


In the study of only one attribute A, we have the following frequencies 
Frequency of the class of order zero : n 
Frequencies of the classes of order | : (A), (a) 


In the study of two attributes A and B, we have the following frequencies 


Frequency of the class of order zero : n 
_ Frequencies of the classes of order 1 : (Ay, (a), (B), (B) 
Frequencies of the classes of order 2 : (AB), (AB), (0B), (aB) 


These observed frequencies can be expressed in the form of a 2 x 2 table as 


Attribute B 
see 
( aB ) (af) (a) 
|| oh A ag aa a aaa 


ais Number of Class Frequencies. If we include the total number of observations n as a 
frequency of the class of order zero, then in general, for k attributes the total number of class 
frequencies would be ( 3 )‘. Thus in case of only one attribute the total number of class 
frequencies would be (3 )' = 3; for two attributes itis (3 )* = 9, and so on. 












Association _ so 917 


15.2.6 Ultimate Class Frequency. The frequencies of classes of the highest order are called 
ultimate class frequencies. The number of ultimate class frequencies for k attributes is given by 
(2)*. Thus in case of two attributes the number of ultimate classes is (2)* = 4, and soon. 


If n is included as a positive class, then for k attributes the number of positive classes 
is the same as the number of ultimate classes. For two attributes, the positive classes are n, (A), 
(B), (AB) and the ultimate classes are (AB),(AB), (@B), (af). 


It is interesting to note a very simple result that any class frequency can always be 
expressed in terms of the class frequencies of higher order. Any class can always be expressed as 
a sum of its two subclasses produced by dichotomizing it for the study of a new characteristic. 
For example, in the study of two attributes, we may have: 


n = (A) +(@) | | n = (B) + (B) 
(A) = (AB) + (AB) | (B) = (AB) + (aB) 
(a) = (aB) + (ap) (B) = (AB) + (aB) 


15.2.7 Consistence of data. The class frequencies that have been observed in one and the 
same population are said to be consistent, if they conform with one another and do not conflict 
each other. In the study of attributes, no class frequency can ever be negative. If any class 
frequency is negative the data are said to be inconsistent: Inconsistency may be due to wrong 
counting, inaccurate additions or subtractions or due to misprints. The necessary and sufficient 
condition for the consistence of a set of class frequencies is that no ultimate class frequency 
should be negative. To test the consistence of data, we calculate the ultimate class frequencies 
from the given data and if any of the ultimate class frequencies turns out to be negative, data will 
be inconsistent. If no ultimate class frequency is negative, the data are consistent. It is however 
important to note that the consistence of data is no proof of accurate count, accurate additions or 
subtractions or the absence of misprints. 


15.3. INDEPENDENCE OF ATTRIBUTES 


If in a sample of size n, the class frequencies of attributes A, B and AB _ are 
represented by (A), (B) and (AB ). Then we have 








Proportion of individuals possessing A = (A). 
n 
Pr : oy entre a8) 
oportion of individuals possessing B = 
n 
Proportion of individuals possessing AB = ABD) 
n 


The two attributes A and B are said to be independent if, . 
Proportion of AB = (Proportion of A )( Proportion of B ) 
(4B) a CAye, SB) 
n n n 


CA)(B) 


nm 


(AB) = 


218 | : Statistics — Part I 


In cyse of independence of attributes A and B, the 2 x2 table must have the form 


| Attribute B ji 
| B B Total 


(A)(B) (A)(B) 


n n 











Attribute A 








(a)(B) (a)(B) 
| n disap 
(B) (B) 


Example 15.1 If there are 144 A’s and 384 B's in 1024 observations. How many 
(i) AB's and (ii) aB'‘s will there be for A and B being independent. 
Solution. Wehave n = 1024, (A) = 144, (B) = 384 
For A and B being independent, we must have 
(A)(B) (144) (384) 


(i) (AB) = z = 004 = 54 
(ii) (a) = n-(A) = 1024 -— 144 = 880 
(B) = n-(B) = 1024 - 384 = 640 
_ (a)(B) _ (880)(640) _ 
(ap) = Sanu WeAiin abiGs| ODN ws 550 


Example 15.2 If the A’s are 60%, the B’s are 40%, of the whole number of observations, 
what must be the percentage of AB's in order that we may conclude that A and B are 
independent? 
Solution. Let nm = 100, then (A) = 60, (B) = 40 
_ For A and B being independent, we must have 

CAB ee) KN _ 

n 100. 

There must be 24% AB’s to justify the conclusion that A and B are independent. 


Example 15.3 Given the following data. Find whether A and. B are independent or 
associated, 


@ ons= 150, (A) = 30, (B) = 60, (AB) = 12 
(ii) (AB) = 256, (aB) = 144, (AB) = 48, (0B) = 768 


Solution. 
(i) Observed frequency of AB's = (AB) = 12 
Expected frequency of AB's = <5). SUNS) _ 12 


Since (AB) = (42°5) | the attributes A and B are independent. 
n 








Association | af 219 





(ii) We have the 2 x 2 table as 





Attribute B 
Attribute A 

(AB) = 256 (AB) =48 | (A) = 304 
(OB) = 768 (op) = 144 | (a) = 912 


Tat | (B)= 0 (py ia | we 26 


Observed frequency of AB’s = (AB) = 256 
(A)(B) _ (304)(1024) 
n 





Expected frequency of AB’s = ————— = ————————. =_ 256 


Since (AB) = an. 


, the attributes A and B are independent. 


15.4 ASSOCIATION OF ATTRIBUTES 
( CORRELATION OF QUALITATIVE VARIABLES ) 


The two attributes A and B are said to be associated if they are not independent, i. ¢., 


cap) « LAMB) 
iH 
Association of attributes may be classified as positive or negative. 


15.4.1 Positive Association. The two attributes A and B are positively associated or simply 
associated, if ? 


(Agy > <A@) 
n 


15.4.2 Negative Association. The two attributes A and B are negatively associated or simply 
disassociated, if 


(AB) < fA) 
n 


It should be noted that disassociation does not imply independence. 


15.4.3 Complete Association and Disassociation. There will be complete (or perfect positive) 
association between two attributes A and B if one of them cannot occur without the other, 
though the other may occur without the one, that is, if | 


(i) (A) = (B) = all A’s are B’s andall B’s are A’s 
(ii) (A) < (B) => all A’s are B’s 
(iii) (B) < (A) => - all B’s are A's 


There will be complete disassociation ( or perfect negative association ) between two 
attributes A and B ifnoneof A’s is B’s andnoneof a'sis B's. 


220 


_ Statistics — Part II 


15.4.4 Coefficient of Association. The strength of association, between two attributes A and 
B, is known as coefficient of association. 


The Yule’s coefficient of association, denoted by Q, is defined as : 


(AB)(aB) — (AB)(aB) 


@ = ~(AB)(aB) + (AB)(aB) 


This coefficient lies between -—1 and +l. 


If QO = O, the two attributes are independent. 
If GO = 1, the two attributes are completely associated. 
If Q = —1, the two attributes are completely disassociated. 


Example 15.4 Given the following : 


(AB) = 110, (aB)=90, (AB) = 290, (af) = 510 


Discuss association. 
Solution. We have the 2 x 2 table as 


Since 







(re | Attribute B 

om Sr 

— (AB) = 110 (AB) = 290 
(aB) = 90 (aB) = 510 (a) = 600 
(B)= 200 ) | n= 1000 | 

(AB) = 110 


Observed frequency of AB’s = 
Expected frequency of AB’s = AED eo SOOM) se 299 
n 1000 
(AB) > AEDS the attributes A and 8B are positively associated. 
if 


Example 15.5 1660 candidates appeared for a competitive examination and 422 were 
successful. 256 had attended a coaching class and of these 150 came out successful. Find the 
coefficient of association between success and coaching a class. 


Solution. Let A represent success and B represent attending coaching class, then we have 





n = 1660, (A) = 422, (B) = 256, (AB) = 150 
Attribute B | 





(AB) = 150° (AB) = 422-150 = 272 |(A) = 422 
|(aB) = 256-150 = 106 (af) = 1404-272 = 1132] (a) = 1660-422= 1238 
(B) = 256 (B) = 1660-256 = 1404 
ae (AB)(aB) - (AB)(aB) 
(AB)(oB) + (AB)(aB) 
© _ (150)(1132) = (272)(106) _ 
(150)(1132) + (272) (106) 








Association —— | 221 


1. (a) 


(6) 


2. (a) 


(b) 


(c) 


3. (a) 
(6) 


(c) 


4. (a) 
(6) 


(c) 


5. (a) 


(6) 


Exercise 15,1 


Distinguish between attribute and variable. Define positive classes; negative classes and 
ultimate classes. | 
Given the following ultimate class frequencies, find the frequencies of the positive and 
negative classes and the whole number of observations n 

(AB) = 95, (AB) = 55, (oB) = 85, (af) = 
{n = 280, (A) = 150, CB) = 1805 Ce) = 150; CB) —100)} 
Given the following frequencies of the one classes, find the frequencies of pines 
classes. 

n 250, (A) 80, (B) = 100, (AB) = 

{ (AB) 10, (aB) 30, (af) = 140, (AB) = aa 
Measurements are made on a thousand husbands and a nous wives. If the 
measurements of husbands exceed the measurements of the wives in 800 cases for one 
measurements, in 700 cases for another, and in 660 cases for both measurements, in 
how many cases will both the measurements on the wife exceed the measurement on the 
husband? 


( 160 ) 
Given that (A) = (a) = (B) = (f) = n/2, show that 
(i) (AB) = (af) (ii) (AB) = (aB) 


Define the consistence of the data. 
Find whether the data given below in each case are consistent? 
(on) =" 1205) CAD = 820 CAB) = 
(ii) nm = 50, (A) = 40, (BB) = 32, (AB) = 
(iii) n = 1000, (AB) = 200, (AB) = 350, (aB) = 500 
{ (i) Not consistent since (AB) .= '-—8, (ii) Notconsistent since (a8) = —7, 
(iii) Notconsistent since (af) = —50} 
Comment on the following data contained in a report: 100 students appeared in a test of 
whom 80 passed in Statistics: 70 passed in Mathematics and 48 passed in both the 
subjects. , 
{ Not consistent, since (af) = = 2h 
What is meant by independence of attributes. 
There is 240 A’s and 270 B’s in 600. observations. What would.be the aumber of AB 
if A and B are independent. betes ied 
{(AB) = 108 } 
If A’s are 60%. and B’s are 40% ‘of the whole number of Dbservationt! what must be 
the percentage of AB’s in order that we conclude that A and B are independent. 
{ AB’s are 24% } 
When are two attributes independent, positively associated, negatively associated? 
Given the following data, determine the nature of association between the attributes A 
and B, i. e., find whether A and B are independent, positively ery 
associated. 
() _ (A) = 30) CB)=560) (CAB) = n = 150 * y 


(ii) (AB) = 110, (aB) = 90, ane = 290, (a8) = 510 


222 oven _ Statistics — Part II 


(iif) (A) = 415, (AB) = 147, (a) = 285, (af) = 170 
{ (i) Independent, (ii) Positively associated, (iii) Negative associated } 
6. (a) What is meant by association of attributes? 
(6) Explain the difference between the following with examples. 
(() Attribute and variable, 
(it) Correlation and association, 
(iif) Positive association and negative association 
7. (a) Find the association between injection against typhoid and exemption from 
attack from the efollovang contingency table 


ked__Notattacked 


Not inoculated 
(Q = 0.65) ¢ 
(6) Calculate the coefficient of the association between the intelligence of fathers and sons 
in the following data: 
Intelligent fathers with intelligent sons = 265 
Intelligent fathers with dullsons = 100 
Dull fathers with intelligent sons = 95 
Dull fathers with dull sons = 450 
(Q = 0.85) 
(ec) Find if there is any association between the tempers of bothers and sisters from the 
following data : 
Good natured bothers and good natured sisters = 1230 
Good natured bothers and sullen sisters = 850 
Sullen bothers and good natured sisters = 530 
Sullen bothers and sullen sisters = 980 
(Q = 0.46) 

8.(a) 750 students appeared in an examination and 470 were successful. 465 had attended 
classes and 58 of them failed. Calculate the coefficient of association to discuss 
association between attending classes and success. 

(Q = 0.92, highly positive association ) 
(6) 100 students appeared in an examination, and 50 failed in Mathematics, 60 failed in 


Statistics and 40 failed in both. Find if there is any association between the failing in 
Mathematics and Statistics. 
(Q = 0.71) 

(c) Can v vaccination be regarded as preventive measure for small pox from the following 
data: “Of 1482 persons in a locality exposed to small pox, 368 in all were 
attacked. Of 1482 persons, 343 persons, had been vaccinated and of these 35 were 
attacked”. 

(Q = -0.57) 





Sa 


Association 723 





15.5 TWO DIMENSIONAL COUNT DATA: CONTINGENCY TABLE 

A simple random sample of n elements selected from a bivariate multinomial population 
that has been classified into r categories A,, A,,°°°, A, of attribute A and c categories 
B,, B,,°°*, B. of attribute B will produce a two-way frequency table which is called an r x 
¢ contingency table — a name dué to Karl Pearson. A contingency table is made up of the 
observed frequencies relative to the two attributes and their categories which is generally 
presented in the following tabular form, with rows representing the r categories A,, A,,°°°, A 


Fr. 
of attribute A and columns representing c categories B,, B,,--*, B.of attribute B. 
15.5.1 Cell Frequency. The number of observations falling in a particular cell is called the cell 
‘frequency. 
An r x c Contingency Table 


Attribute B 
Attribute A Row total 
A, Fo: - 
A, 0». 
A. 0;. 
A, O,. 
Column total A 





The table shows, in all, k = rc cells or categories. The symbol O,, denotes the 
number of sample observations in the ( i, j ) category of attributes A and B, respectively, for 
i = 1, 2,-+:, r and j = 1, 2,+++, c. The entries in the table represent the realizations 9; ; 
the observed frequencies of the random variables O; ;. Note that the i-th row total is the observed - 


frequency of the i-th category of attribute A summed over all categories of attribute B. 
Similarly,: the j-th column total is the observed frequency of the ie category of attribute 2 
summed over all categories of attribute A. Let 


0}. = 8 Oi; for i = Le y coer ues r 
jst 
r 

0; = roy for i | ee ors 


denote the row and column sums, respectively, where the “dot” notation indicates the subscript 
over which summation has taken place. That is 





224 . Statistics — Part Il 





0;; = Observed frequency of A; 0 8; 
o;,, = Observed frequency of A;, i. e., i-th row total 
o.; = Observed frequency of B;, i. e., j-th column total 


eK F € 
>; y on ») 0, = > Oo; = Nn 
i=l j=l i 


= 1 jel / 


15.6 TEST FOR STATISTICAL INDEPENDENCE 


In analyzing bivariate multinomial populations, the first-step of a typical inferential 
aspect of interest usually is whether the two attributes are statistically independent or whether 
certain levels of one attribute tend to be associated or contingent with some levels of another 
attribute. If they are independent, we know that there is no relationship between them. If it turns 
out that they are not independent and a relationship does exit between the two attributes, the next 
step in the analysis then is to study the nature of the relationship. We begin with the first step of 
the analysis, testing whether or not the two attributes are independent. 


We are concerned with testing the null hypothesis that the two criteria of classification 
are independent. Recall, if two classifications are independent of each other, a cell 
probability will equal the product of its respective row and column probabilities in accordance 
with multiplicative law of probability. Therefore, the null hypothesis stating that the 
events A,, A,,° DSA ale independent of events B,, B,,---, B. can be rephrased 


PCA; OB;) = PCA; ) PCB; ) forallvi=sl 2. rand: j= 1520 +=) c! 
Thus the null and alternative hypotheses for a test of statistical independence are 
Null hypothesis H,: A; and B; are independent for all cells (i, j ) 
Alternative hypothesis H,: A; and B, are not independent for some (1, j ) 
Here, Hy represents statistical independence and H, represent statistical dependence. 


The problem now becomes testing the goodness of fit for the model of independence. 
We compare the observed frequencies 0;, with the expected frequencies E(O;,) that are 


expected if the attributes are independent. Under the null hypothesis of independence of attributes 
the estimate of expected frequency E(O, pis 








freedom for large n. 


wee Q;-0-; _ (i-th row total) (j-th column total ) 
ij n number of observations 
The test statistic then becomes 
; c Ne 
7 = ; >} (9; 5 — €j;) 
i=l j=! €; j 
which has an. chi-square distribution with y = (r - 1)(¢ — 1) degrees of 











Association == 225 — 


In this case, we base the test statistic on the expected number of elements ij in the 
sample from each category if Hp is true and the postulated proportions hold. The farther the 
observed frequency o9;, departs in either direction from the expected frequency Ci js 
(0; — &; ) and hence the larger is 7. On the other hand, if there is perfect agreement between 


the-large is 


the observed and expected frequencies, i. e., 0,, and e,, are identical for all classes, y2 = 0 
because each (0;; — €; po = 0. If all the ste, frequencies 9;; are close to the expected, 


frequencies e, . 


;; Supporting Hp, the value of xy’ will be near to zero; if o,; are far from e; 


ij’ 
indicating rejection of H,, the x* will assume a large positive value. It eaten therefore, that. 
for a given level of significance @ the critical region is the upper tail of chi-square distribution 
_with v = (r-—1)(c — |) degrees of freedom, i. e., 


Critical region: Lier Ble se 


On the question how large a sample size should be, we know that this test is based on the 
normal approximation to the binomial, a fairly conservative rule of thumb is that the 
approximation is adequate if each e;, 2 5. If there are not at least 5 items, the value of chi- 
square is inflated because squared differences are divided by a very small size expected frequency 
in y? = d{( O;; - ij )? / ej }. However, if the cells have too small expected frequencies the 
condition of at least 5 items in each expected frequency class can also be accomplished by 


combining neighbouring row or column class, but for pair of rows or columns that is combined 
the number of rows or columns for degrees of freedom is reduced by one. 


15.6.1 Assumptions. To conduct a valid test of hypothesis for independence using data from a 
contingency table, the following conditions must be met. 


(i) A simple random sample of size n has been selected from a bivariate multinomial 
population. 


(ii) The sample size n is reasonably large so that for each cell, the estimated epee 
frequency must be at least 5. , 

15.6.2 Yates’ Correction. To improve the approximation to the y* distribution and thus be 

able to obtain a more exact probability value from the y? table, F. Yates has proposed a 


correction for continuity, applicable when the criterion has a single degree of freedom. The 
correction is intended to make the actual distribution of the criterion, as calculated from discrete 


data, more nearly like the y? distribution based on normal deviations. The relation Z7 = y* 
between Z and y?” holds only for a single degree of freedom. The approximation calls for the - 
absolute value of each deviation to be decreased by 1/2, because for two celled tables, the 
deviations are always equal in magnitude but opposite in sign. Therefore 
| o—e| —0.5)? 
Adjusted y? = ss Meee OS 
e€ 


Thus Yates’ correction is analogous to the continuity correction which is applied in,the 
normal approximation to the binomial distribution. There is a tendency to under estimate the 


226 , Statistics — Part II 


probability, which means that the probability of rejecting the hypothesis will be increased. 
Adjustment results in a lower chi-square. Consequently, in testing hypothesis, it is worthwhile 
only when unadjusted y is greater than tabulated y? at the desired probability level. When 
n(or e) is large continuity correction has little effect, but when e’s are small, it should be 
applied. When o- e| is less than 0.5, the continuity correction should be omitted. 


15.6.3 Coefficient of Contingency. The coefficient of contingency is a measure of the strength 
of association on a numerical scale as an index of association between two criteria of 
classification. When the test for statistical independence leads to the conclusion of dependence, 
we may wish to measure the strength of association between two criteria of classification. Insofar 
as the y® statistic represents an over all deviation from the model of independence, it is 
’ intuitively reasonable to use this statistic to gauge the strength of this association. We may call 
x? as the “square contingency”. But in applying the y 7 statistic as a measure of association the 
limitation is that the number of degrees of freedom attached to this statistic depends upon the 
dimensionality of the contingency table. A y* value of 16.5 ina 2x2 contingency table 
would reflect a significant association, but this would not be so ina 6 x8 contingency table. 
Several measures of association have been proposed to adjust the y * statistic to a common scale 
that is irrespective of the dimensionality of the contingency table. We then write 


and call @* as the “mean square contingency”. In the following are two commonly used 
formulas, large values of a measure indicate a strong association and small values of a measure 
indicate a weak association between the two criteria of classification. 


Pearson's coefficient of mean square contingency: 


oe = 
Gi pba \ eter Ss 
n+ 7? q 


where gq represents the number of rows or columns, whichever is smaller, and n indicates the 
sample size. 

Example 15.6 Four hundred and ninety two candidates for scientific posts gave particulars of 
their university degrees and their hobbies. The degrees were in either mathematics, chemistry or 
physics and the hobbies could be classified roughly as music, craftwork, reading or drama. The 
data are presented concisely in the following contingency table. 





Degree 


Mathematics Chemistry Physics 


10 26 44 


Discuss the association between the two criteria of classification, 1.é., the degrees and hobbies. If 
the | null hypothesis of independence is rejected, calculate the Pearson’s coefficient of mean 
square contingency. What could be its maximum value for this contingency table. - 


Association . 227 - 












Solution. The elements of the one-sided right tail test of hypothesis are 
Null hypothesis H,: The degree and hobby are independent. 
Alternative hypothesis H,: The degree and hobby are not independent. 
Level of significance: a = 0.05 => l1-a = 0.95 
r o¢€ ft 2 
Degrees of freedom: Vv = (r-1)(c-1) = (4-1)3'-1) = 6, 
Critical value: to. = Heoss = 12.59 (From Table 11) 
Critical region: ye > 12°59 
Decision rule: | Reject Hp if x* > 12.59, otherwise do not reject Hy. 
_ Observed value: The observed frequencies 0; ; are : 
| | Degree ; . 
Hobby Mathematics: B, Chemistry: B, Physics: B, Row total 
Music: A, 833 o,. = 124 
Craftwork: A, o,. = 101 
Reading: A, | o,. = 187. 
Drama: A, o,, = 80 
Column total O.. = 292 n = 492 


The expected frequencies e; j under the null hypothesis of independence are 


oes 9;-9-; _ (i-th row total) (j-th column total ) 
af n number of observations : 

which are given in the following table. Only (r — 1)(c — 1) = (4 — 1) -1) = 6 expected 
frequencies are obtained through this procedure. We could work through this procedure to give 
the other expected frequencies, but this is unnecessary, as the remaining frequencies can be found 
by using the fact that the sub-totals and totals must agree with those in observed data; ; 





















| Degree 
Hobby | Mathematics: B, Chemistry: B, Physics: B, | Row total 
Music: (A. C200 2 36 4 Bal 124 
} 492 492 
Craftwork: A, | QlODCT) _ j5g GON) 599 953 101 . 
492 } 492 | 
Reading: A, | (8007) _ 493 (8092) iio 467 187 
| 492 492 | 2 
Drama: - A, 47.5 20.0 380 





Column total _ Tn 292 123 492 


228 | _ Statistics — Part II 


The y2-statistic is calculated as under 


Observed frequency § Expected frequency 





Conclusion: Since %* = 54.06 > 12.59, we reject H, and conclude that the two criteria 
of classification are association. 
Pearson's coefficient of mean steels pO RIRENEY: 


34.06 
C = pa - Io = 0.315 
n+y2 492 + 54.06 


Maximum value of C for this contingency table: 


coe Sy = 0.8165 


Example 15.7 Discuss ie! WEG of stature of parents and off-springs for the following 
data ube : 


Parents 
Medium _ 





Solution. The elements of the one-sided right tail test c of hypothesis are 


Null hypothesis: a The stature of off-springs is independent of the stature of parents. 
Alternative hypothesis: : The stature of off-springs is not independent of the stature of 
parents. | 


ae =(005° => 1—-a =.095 


Association 229 





rewic 2 | 
ae) se Oi; 015 = Fi)” follows an approximate chi-square 


ibn AE istj=l Gj, distribution under H, with 

Degrees of freedom: v= (r— Ife — 1). = 4 - IDG — 1) = 6, since first two 
column are pooled because e,, = 2.88 < 5. . 

Critical value: y ces bar Ue ee = 12.59 (From Table 11 ) 

Critical region: | ¥* > 12.59 | 

Decision rule: Reject Ho if x? > 12.59, otherwise do not reject Hy. 

Observed value: The observed frequencies 9; ;, are given in the following contingency 


table: 


Parents | 
Off-springs | Very tall: B, Tall: B, Medium: B, _— Short: B, Row total 





Very tall: A, _ 0, = 72 
Tall: A, 0, = 236 
Medium: A, 03, = 433 
Short: A, 0, = 259 
Column total Dy = 352° ¢€ = ; 0.4 = 290 n = 1000 


The expected frequencies! ej | under the null hypothesis of independence are 


pret HER ( i-th row total ) ( j-th column total ) 
J n number of observations 

which are given in the following table. Only (r — l)(e — 1) = 4 —- 1)(4 - 1) =9 expected 
frequencies are obtained through this procedure. We could work through this procedure to give 
the other expected frequencies, but this is unnecessary, as the remaining frequencies can be found 
by using the fact that the sub-totals and totals must agree with those in observed data. 


Parents 














Off-springs Very tall: B, Tall: B, Medium: B, = Short B, _| Row total 
Very tall: A, (72) (40) (72) (332) (72) (338) 
1000 1000 1000 






20.88 72 
= 2.88 = 23,90 = 24.34 
Tall: A, (236) (40) ' (236) (332) (236) (338) 
1000 1000 1000 
.f 236 
= 9,44 = 78.35 = 79.77 oe 
Medium: A, (433) (40) (433) (332) (433) (338) 
1000 1000 1000 
125.57 433° 
=17/32 = 143.76 = 146.35 | 
Short: A, 10.36 85.99 87-54. sk 259 


Columntotal | 40 B32e Ter 3382 ee 290 _ _1000 





. 230 


Statistics — Part Il 


Combining the first and second columns of the expected frequencies, we get 






Parents 
Medium: B, Short: B, 
2.88 + 23.90 = 26.78 24.34 20.88 
9.44 + 78.35 = 87.79 79.77 
17.32 + 143.76 = 161.08 146.35 
10.36 + 85.99 = 96.35 87.54 



































Parents 
Medium: B, 
20 + 30 = 50 20 2 
14 +125 = 139 
3 +140 = 143 
3 +37 = 40 68 151 
















Observed frequency Expected frequency (0; =€;) )? 
€ij 





xy? = 233.13 


Conclusion: Since y* = 233.13 > 12.59, we reject H, and conclude that the Stature of 
off-springs is not independent of the stature of the parents. 
Pearson's coefficient of mean square contingency : 


Cx Leos 233.13 
n+ x? 1000 + 233.13 





= 0.435 


ee 


Association —— : 231 


Example 15.8 A random sample of 30 adults is classified according to the sex and the 
number of hours they watch television during a week : 





Over 25 hours 

| Under 25 hours 
Using o& = 0.01 test the hypothesis that a person’s sex and time watching television are 
independent. 
Solution. The elements of the one-sided right tail test of hypothesis are 
Null hypothesis H,: The sex and time watching television are independent. 
Alternative hypothesis H,: The sex and time watching television are not independent. 
Level of significance: a = 0.01 =s 1—a = 0.99 
Test statistic: Yo y x (2 St | SHED eS imubation: 

i=l j=l Cij under H, with 

Degrees of freedom: v= (r-1l)e-1) = (2-1)(2-1) = 
Critical value: Meme = Xto99 = §.63 
Critical region: xy? > 6.63 
Decision rule: Reject H, if 7* > 6.63, otherwise do not reject Ho. 
Observed value: The observed frequencies 0; , are given in the following table: 





Time watching television Row total 


Over 25 hours: A, 0). = 13 
Under 25 hours: A, | 0. = 17 


The expected frequencies e,;, under the null hypothesis are 


O;. 2 _ (i-th row total) (j-th column total )_ 
number of observations 


Time watching television Male: B, Female: B, Row total 


(13) (15) 


Over 25 hours: A, = 6.5 


30 





Under 25 hours: A, — ; ~ 17 


Column total rH30 


232 : _ Statistics — Part II 


Now we calculate the y? statistic as under 


Category Observed frequency Expected frequency ( Rives | ~ 05)? 
(i, J) 0; ; ij f, j 
a , “ij 
A, B, ; 2 6.5 0.154 
A, B, 8 6.5 0.154 
A, B, 10 8.5 0.118 
A; B, : 7 a) 8.5 0.1 18 
Total — 30 30.0 x? = 0.544 
Conclusion: Since y* = 0.544 < 6.63, we do not reject Hp: M;,= %;. %.; for all 


(i, J) against H,: 1,, # %;. %.; for at least one (i, j). 


Exercise 15.2 
1. (a) Define contingency table and cell frequency. What is a 2 x 2 contingency table, 


(9) In an investigation into eye-colour and left or right handedness of a person, the following 
results were obtained: 


Blue 85 
| Brown 


Do these results indicate, at the 5% level of significance, an association between eye 
colour and left or right handedness. _ 

. ( Since -Adj ¥2 = 0.56 < 3.84 = Wages we do not reject H,: There is no 
association between eye colour and left or right handedness against H,: There is 
association between eye colour and handedness. ) 

(*) An investigation into colour-blindness and sex of a person gave the following results: 
~ Colourblindness 
Colourblind  —_—Not colourblind 


Male 36 964 
f 2 
Is there evidence, at the 5% level, of an association between the sex of a person and 
whether or not they are colourblind? 

_ (Since Ad) x7 = 4.79 > 3.84 = i 0.95» We reject H,: There is no association 
between sex of a person and colour-blindness in favour of H,: There is association 
between sex of a person and colour-blindness, ) 

2.11) A driving school examined the results of 100 candidates who were taking their driving 


__._ _ test for the first time. They found that out of the 40 men, 28 passed and out of the 60 
,women, 34 passed. Do these results indicate, at the 5% level of significance, a 






Handedness 




















(6) 


3. (a) 


(6) 


4, (a) 


Association 233 





relationship between the sex of a candidate and the ability to pass first time ? 
(Since Adj y2 = 1.290 < 3.84 = Lies we do not reject Hy: There is no 


relationship between the sex of a candidate and the ability to pass first time against 
H, : There is relationship between the sex and ability to pass.) ” 
Out of 1350 persons, 450 were literate and 600 had traveled beyond the limits of their 
district, 300 of the literates were among those who had traveled. Find out by calculating 
(i) coefficient of association, (ii) the value of chi-square, if there is any association 
between traveling and literacy. | 
(Q = 0.6, Since Adj x* = 133.65. > 3.84 = rae 0.95» We reject H,: There is no 
association between traveling and literacy in favour of H,: There is association between 
traveling and literacy. ) 
The following are the data on a random sample of 150 chickens, divided into two 
groups according to breed and into three group according to yield of eggs. 


Rhode Red 
Leghorn White _ 





Are these data consistent with the hypothesis that yield is not affected by the type of 


breed? 

(Since y* = 4.07 < 5.99 = Uaicet we do not reject Hj: There is no association 
between chicken breed and yield of eggs against H,: There is association between 
chicken breed and yield of eggs. ) : 


The students of a college took three courses : arts, commerce and science. The students 
were classified according to the sex. The data on these students are given as follows : 


Course of study 
Commerce _ Science. 





Use chi-square test whether there is any association between sex and choice of course 

of study. 

(Since y* = 13.888 > 5.99 = Ce 0.95» we reject H,: There is no association 
between sex and course of study in favour of H, : There is association between sex and 
course of study. ) | 


The following table shows liking of three colours: pint “ie and blue in samples of 
males and females: | | Se aeyfd 








a) DETR —————— SS a ee en ee 


(6) 


5. (a) 


(6) 


- 


’ Statistics — Part II 


Test whether there is any relation between sex and colour. 
(Since 7* = 26.3889 > 5.99 = yO 0.95» We reject Hy: There is no relation between 
sex and liking of colours in favour of H,: There is relation between sex and liking 


of colours. ) 
The following table gives the condition at home and condition of the children. 


| Condition at home 
Condition of children Clean Not clean 


Clean 





Fairly clean 

Dirty 
Test for the association between the conditions at home and condition of children. 
(Since y* = 5.027 < 5.99 = 7a 0.95 » We do not reject H,: There is no association 


between conditions at home and condition of children against H,: There is no 
association between conditions at home and condition of children. ) 


The table given below shows the relation between the performance of students in 
economics and statistics. Test the hypothesis that the performance in economics is 
independent of the performance in statistics using 5% level of significance : 


Grade in statistics 
Grade in economics High 


Medium 
| High 


Medium 
Low 


(Since y? = 89.2112 > 9.49 = x2 , we reject H,: There is no association 
4; 0.95 0 












Low 





between the performance of students in economics and statistics against H,: There is 
association between the performance in economics and statistics. ) 
A thousand households are taken at random and divided into three groups A, B and C, 
according to the total weekly income. The following table shows the numbers in each 


group having a colour television receive, a black and white receiver, or no television 
at all. 


= 
Television typ A B C 


Colour television 
Black and white 
None ) 
Calculate the expected frequencies if there is no association between total income and 
television ownership. Apply a test to find whether the observed frequencies suggest that 
there is such an association. | 
(Since 7? = 266 > 9.49 = y z 0.95» We reject H,: There is no association between 


television type and income group in favour of H,: There is association between 
television type and income group. ) 


- 


* 





Association , | —_ a 8S 





6. (a2) Arandom sample of 200 married men, all retired, were classified according to education 


(6) 


(c) 


and number of children as indicated below : 


| Number of children 
Education 


2—3 
Elementary 
Secondary 
College _ | 12 err ha ee 
Test the hypothesis, at 5% of significance, that the size of family is independent of the 
level of education attained by the father. 
(Since y?2 = 11.7194 > 9.49 = Yj.o95, we reject Hy: There is no association 















= l Over 3 





between education and number of children in favour, of H,: There is association 
between education and number of children. ) 
A survey of 200 families known to be regular television viewers was undertaken. They 
were asked which of the three television channel they watched most during an average 
week. A summary of their replies is given in the following table, together with the region 
in which they lived. 









29 
6 11 26 7 

| eS 3 ‘2S 10 | 

Test the hypothesis that there is no association between the channel watched most and 

the region. 

(Since y? = 13.446 > 12.59 = 76 995, we reject Hy: There is no association 





between the channel and region in favour of H,: There is association between the 
channel and region. ) 
From the following table showing the number of employees and condition of factory. 
Condition of Number of persons employed 
al Premises | Under 50 Si—150  151—250 Over. 250 
62 


25 





5 
Discuss the association between the condition of premises and the number of persons 
employed. Compute the coefficient of contingency. | 
(Since y? = 30.06 > 12.59 = 4.995, we reject Ho: There is no association 
between the condition of premises and the number of persons employed in favour of 
H,: There is association between the condition of premises and the number of persons 
employed. C = 0,22) 





236 . Statistics — Part II 


15.7 RANK CORRELATION. 


The correlation between ranks of individuals for both the variables X and Y is called 
rank correlation. A special case of correlation is when both the variables X and Y consist of 
sets of ranks. Suppose, for example, that two judges have ranked the same set of n objects 
according to some characteristic of interest. We are interested in determining whether the ranks 
assigned to the objects by one judge are related to or show any agreement with ranks assigned to 
the same objects by another judge. 


15.7.1 Derivation of Spearman's Coefficient of Rank Correlation... Suppose that we have a 
sample of n individuals from a continuous bivariate population and two measurements for 
variables X and Y are made on each individual. We have n_ pairs of observations (a,, b,), 


(a,, 5,), ***,(a,, 6,). These values for two variables can be ranked in separate ordered series. 
Let X}, X, ***, x, be the ranks of a,, a,, --:, a, and Vii) Vane De be the ranks of 
b,, 6,, ---, b,. The coefficient of rank correlation r is the ordinary correlation coefficient 
between the two sets of ranks. Then the coefficient of rank correlation is 


6 Dd? 


F 


The r, always lies between —1 and + 1. This formula is called Spearman's coefficient 
of rank correlation, in the honour of Charles Edward Spearman. Spearman's rank correlation 
coefficient is equivalent to Pearson’s product moment correlation coefficient computed for ranks 
rather than the original observations. This nonparametric procedure can be useful in correlation 
analysis even when the basic data are not available in the form of numerical magnitudes but when 
the ranks can be assigned. The ranks may be assigned in order from high to low, with 1 
representing the highest, 2 the next highest, efc. (or in order from low to high, with 1 
representing the lowest, 2 the next lowest, etc.). ; 

Example 15.9 Using Spearman's formula calculate coefficient of rank correlation for the 
following data giving ranks to the measured quantities. 








1. (a) 
(5) 


(c) 


3. (a) 


(5) 


4. (a) 


nm pa ee ED) yey, 
n(n? =—1) a 6(6) Ee | 
___Exercise 15.3 _ 


What is rank correlation? 3 oe. a 
The following table-shows how. 10. students, arranged in alphabetical order, were 
ranked according to their achievements in both laboratory and lecture portions of a 
statistics course, Find the coeffi cient of mark correlation. 


Laboratory | 8 z- --9----4 an et ee ee 
Lecture 9 ay 1W ea Sonar? 5 oe 2 
(r_ = 0.8545) : 


The ranks of the same 10 students in Mathematics and Economics were as follows: 

; 6); ©2,, 5); G37 1) C4s 24>) ECS 20), (6, 7): C7: 8); (8, 10); 
- 3); (10, 9); the two numbers within brackets denoting the ranks of the same 
students in Mathematics, and Economics respectively. Calculate the rank correlation 
coefficient for proficiencies of this group in two subjects. 
(r,.= 0.45) 


Five sacks of coal A, B, C, D and E have different weights, with A being 
heavier than B,-B being heavier than C, and so on. A weight lifter ranks the sacks. 
( heaviest first ) inthe order A, D, B, E, C. Calculate a Beate of rank correlation. 


(r, = 0.5) 


Seven army recruits A, B, C, D, E, F and 2 were given two ueche ate aptitude tests. 
Their orders of merit in each test were 


Order of merit _ Ist 2nd 3rd 4th Sth 6th Tth 
First test ae: Fig A' a ae Box.. ED 
Second test D F E ABM. GG APA 


Find Spearman's coefficient of rank correlation between the two orders and comment 
briefly on the correlation obtained. 


(r, = —0.036, Very little negative correlation ) 

Ten competitors in a beauty contest are ranked by three judges in the leet order 
Competitor Lz fy C D ES otk GistH: oF J 
Judge X 1 H6 10% in. mae eT 8 
Judge Y | 3 SS yeaa 7 10s 2 eel 9 
‘Tudge ZS! 6 a OR ee ae os ae Oe 


Use Spearman's rank correlation coefficient to discuss which pair of judges have ts 
nearest approach to common tastes in beauty, 


(ry = -0.21, r,, = —0.30, 1, = 0.64; This indicates that judges the X and Z 


x 
have the nearest approach to common tastes in beauty. ) 





(6) The following table shows the grade point average awarded to six children in a 
competition by two different judges. 


Child A B C= ane 2 F 
fageXunne| POs oS) asi) - 98 Tino 
Judge ee 19.4 1.9 9.6 8.9 6.9 
Calculate coefficient of rank Perrelation by Spearman's formula. 
‘(r, = 0.26) 
(c) The following table shows the marks of six candidates in two subjects. 
Candidate TA B CG = SDx F_ 
Mathematics x; | 38 62 S56 aes edo: 59 48 
Statistics y, | 6 89 84 60 73 69 


(7) Calculate the coefficient of rank correlation. 
(ii) Comment on the value of your result. 
{ (@) 0.886, (ii) High positive correlation } 


Exercise 15.4 
Objective Questions 
1. Fill in the blanks. 
(i) A characteristic which varies in quantity from one individual 


to another is called a ————. (variable) 
(ti) A characteristic which varies in quality from one individual 
to another is called an —————. (attribute) 
(iti) The observations made on niece regarding an attribute are 
called —_——— data. (qualitative) 
(tv) ——— is a process of dividing the objects into two 
mutually exclusive classes of an attribute. (Dichotomy) 
(v) The degree of linear relationship between the two variables 
is called —————, (correlation) 
(vi) The degree of relationship between the two attributes is 
called —__—__. (association) 
(vit) The two attribures A and B are ————, if 
(AB) = CA)(B) | (independent) 
n 
(viii) The two attribures A and B are, if 


(4B) + ——— ees | (associated) 


Correlation Analysis 239 
(ix) The two attributes A and B are associated, if 
(AB) > (ce) (positively) 
il 
(x) The two attributes A and B are associated, if 
(AB) < CAD CEY) (negatively) 
n 
(xi) The coefficient of association, denoted bY Q, is a measure 
of association between the two (attributes) 
(xit) If the coefficient of association equals 0, the two attributes 
A and B are. (independent) 
(xiii) If the coefficient of association is not equal to 0, the two 
attributes A and B are 7 (associated) 
(xiv) If the coefficient of association equals —1, the two attributes __ 
A and B are completely. (dissociated) 
(xv) If the coefficient of association equals 1, the two attributes 
A and B are completely (associated) 
(xvi) A —————table consisting of r rows and c columns is 
made up of the observed frequencies relative to two 
attributes and their categories. (contingency) 
(xvii) The two attributes are said to be , if for every cell 
of a contingency table the observed frequency @;, is equal 
to expected frequency e. j- (independent) 
(xx) Foran rXc contingency table, the 7? -statistic has degrees 
of freedom v = (r—1)(e-1) 
(xxiv) The larger are the difference between the observed and 
expected frequencies, the larger will be the value of 7? 
which leads to the of H, of independence. (rejection) 
(xxv) The rejection of Hg of independence indicates that the two 
criteria of classification are (associated) 
(xxvi) In a chi-square test for eiovendencel no expected frequency 
should be ————— than 5. (less) 
2. Mark off the following statements as false or true. 
-() A characteristic which varies in quantity from one individual to 
another is called an attribute. (false) 
(ji) The quantitative data relating to an attribute may be obtained 
simply by noting its presence or absence in the objects. (true) 
(iii) The presence of attributes is denoted by capital Latin letters and 
their absence by Greek or small letters. (true) 
(@) The class frequencies of the highest order are called ultimate 


class frequencies. 


(true) 





Se a 


= eee 


i 
5 
' 
| 
i 
i 
| 
i 
i 
| 
i 





Oh 


(v) 


(vi) 


(vii) 

(viii) 
Hi 

(x) 


Gi) 
(xii) 


(xiii) 
(xiv) 


(xv), 


_ of classification are associated. 


ore 


The two attributes A and B are associated, if. 


(apy =< LAY) 
n 


The two attributes A and B are positively associated, if 


(A)(B) 
(AB) < = 


The coefficient of correlation, denoted by r, is a measure of the 
strength of linear relationship between two variables. 


The coefficient of association, denoted by Q, is a measure of 
association between the two attributes. 

The coefficient of association always lies between — 1 and 1. 

A contingency table consisting of r rows and c¢ columns is 
made up of the observed frequencies relative to two attributes 
and their categories. 

The disassociation of two attributes means their independence 

A measure of the discrepancy between the observed and 
expected frequencies is called a chi-square (v7) test of 
independence. 


The value of y? -statistic is always non-negative. 


The larger are the differences between the observed and expected 
frequencies, the larger will be the value of y? which leads to 
rejection of H, of independence. 


The rejection of H, of independence indicates that two criteria 


Statistics — Part II 


(false) 


(false) 


(true) 


(true) 


(true) 
(true) 


(false) 


(true) 


(true) 


(true) 


(true) 


1 SG ANALYSIS OF 
TIME SERIES 


16.1 TIME SERIES 


The sequence y,, y.,‘**, y, Of m observations of a variable Y, recorded in 


accordance with their time of occurrence f,, f,, °°", f,, is called a time series. Symbolically, 
the variable Y can be expressed as a function of time ¢ as 
y = f(t) +e 


where f(t) is acompletely determined or specified sequence that follows a systematic pattern of 
variation and ‘¢ is a random error that follows an irregular pattern of variation. 

Signal, The signal is a systematic component of variation in a time series. 

Noise. The noise is an irregular component of variation in a time series. 

Therefore, a time series is a sequence of observations, on a variable, that are arranged in 
chronological order. The observations in a time series are usually made at equidistant points of 
time. Examples of a time series are: the hourly temperature recorded at a weather bureau, the total 
annual yield of wheat over a number of years, the monthly sales of a fertilizer at a store, the 
enrolment of students in various years in a college, the daily sales at a departmental store, etc. 
16.1.1 Historigram. A historigram is a graphic representation of a time series that reveals the 
changes occurred at different time periods. A first step in the prediction or forecast of a time 
series involves an examination of the set of past observations. The construction of a historigram 
involves the following steps: 

(i) | Using an appropriate scale, take time ¢ along x-axis as an independent variable. 
(ii) Using an appropriate scale, plot the observed values of variable Y as a dependent 
variable against the given points of time. 
(iii) Join the plotted points by line segments to get the required historigram. 
Example 16.1 Draw a historigram to show the population of Pakistan in various census years 


Census Population 
year (million) 


140 
120+ y 


— 
nA oo S&S 
oe, © © 


Population — 





38 





o 


1972 1981 1998 =x 
Census year 


Fig. 16.1 Population of Pakistan 


1951 1961 





241 








242 | Statistics — Part Il 


Solution, The population of Pakistan in different census years is represented by a historigram as 
shown in Fig. 16.1. 


16.2 COMPONENTS OF A TIME SERIES 


The examples of time series suggest that a typical time series may be composed of the 
following four components: / 


(it)  Seculartrend (7) 

(ii) Seasonal variations (S ) 
(iii) Cyclical fluctuations (C) 
(iv) Irregular movements (/) 


These are the basic components of a time series, each of which is regarded as the result 
of a well defiried distinct cause. A time series is not necessarily composed of all these four 
components. 


16.2.1 Secular (Long-term) Trend. The secular trend is a line or curve that shows the general 
tendency of a time series. It represents a relatively smooth, steady, and gradual movement of a 
time series in the same direction (upward or downward). It shows the general increase or 
decrease in a sequence of observations, and reflects the effect of the forces operating over a fairly 
long period of time. Examples of secular trend are: 


(i) |The decline in death rate due to advancement in science. 
(ii) A continually increasing demand for smaller automobiles. 
(iii) A need for increased wheat production due to a constant increase in population. 


16.2.2 Seasonal Variations. The seasonal variations are short term movements that represent 
the regularly recurring changes in a time series. These variations indicate a repeated pattern of 
identical changes in the data that tend to recur regularly during a period of one year or less. These 
changes are repeated with the same pattern within a specific time period, called the periodicity. 
Seasonal variations may have the fixed periodicity, such as daily, weekly, monthly, or yearly etc. 
These changes are periodic in nature and their influefice; upon a specific time series is fairly 
regular, both in respect of length (time) and amplitude ( size ). 


These variations reflect the effect *T Cold drinks 
caused by the recurring events. The main ( 000 ) 
causes of seasonal variations are seasons, aa . 1995 
religious festivals and social customs. | | 
Examples of seasonal variations are: 

(i) The weekly statements of sales in 

a Store. 

_ (ii) _ An-increase in consumption of 
‘ electricity in summer. 
(iii). An. after’ Eid sale “in a 





departmental store. | 0 Lanne fe 
(iv) An increase in sales of cold drinks - Quarter 
during summer. _ Fig. 16.2 The seasonal pattern of cold 


drinks sales 


Analysis of Time Series 243 

16.2.3. Cyclical Fluctuations. The cyclical fluctuations are the long term oscillations about the 
trend. These are the periodic up-and-down movements in a time series that tend to recur over a 
long period of time. The cyclic patterns tend to vary in length ( time ) and amplitude ( size ) and 
they are differentiated from the seasonal variations by the fact that they do not have a fixed 
periodicity. Although, these variations are recurring yet are less predictable than seasonal 
variations and secular trend, therefore, they have a more dangerous effect on a business and 
economic activity. These fluctuations reflect the effect caused by a so called business cycle. A 
business cycle has the following four phases: 


(t) Trough ( Depression ) 

(ii) Expansion ( Recovery ) 

(iii) Peak ( Boom or Prosperity ) 
(iv) Recession ( Contraction ) 


A trough is the lowest point 
relative to the rest of the particular 
cycle. After the downswing has run 
its course, the expansion phase 
reverses direction and Starts rising. 
The upswing eventually levels off and 
reaches its peak. This is the highest 
point relative to the particular cycle. 
Finally, the upswing starts to turn 
downward. We refer to this following 
phase as a recession. 





Fig. 16.3 The four phases of business cycle 


16.2.4 Irregular Movements. The irregular movements are unpredictable changes that 
indicate the effect of random events. The examples of random events are wars, floods, 
earthquakes, strikes, fires, elections efc. The irregular movements are unsystematic, non- 
recurring, accidental and unusual in nature. These variations are also known as erratic, accidental 
or random variations. Examples of irregular movements are: 

(i)  Asteel strike, delaying production for a week. 


(ii) A fire in a factory delaying production for 3 weeks. 
16.3 ANALYSIS OF TIME SERIES 


The analysis of a time series is the decomposition of a time series into its different 


, components for their separate study. The process of analysing a time series is intended to isolate - 


and measure its various components. The study of a time series is mainly required for estimation 
and forecasting. An ideal forecast should basé on forecasts of the various types of fluctuations. 
While performing the analysis, the components of a time series are assumed to follow either the 
multiplicative model or the additive model. Let 


Y = Onginal observation, — | 
T = Trend component, = Seasonal component, 
C = Cyclical component, | I = Irregular component. 


LL —_— rr ws ee = 





244 Statistics — Part Il 


In the multiplicative model, it is assumed that the value Y of a composite series is the 
product of the four components 7, S, C and /. Symbolically, 


) EO Od Nie a OSS OF | 


where the component 7 is given in original units of _Y but the other components S, C and / 
are expressed as percentage unitless index numbers. 


In the additive model, it is assumed that the value Y of the composite series is the sum 
of the four components 7, S, C and /. Symbolically, 


Y=aT+S+C+! 


where the components 7, S. C and / all are given in the original units of Y. Conventionally, 
the multiplicative model is considered as the standard model for analysis of a time series. 


16.3.1 Coded Time Variable. We can take the origin at the beginning of a time series 
and assign x = 0 to the first period and then number other periods as 1, 2, 3,-++ as shown 


at left of the following table. However, it is important to note that in order to simplify the 
trend calculations, the time variable f is coded by 


(t-1t)/h for odd number of periods 
x = 4(t-t)/h for even number of periods in units of h period 
(t—1)/(h/2) for even number of periods in units of h/ 2 périod 
where the average number 7 = ( first period + last period )/2 and h is the constant interval in 
the time variable. Since d( t — A). = 0, then:we get D) x =.0 = Dx? = Dix =-*-; and 
so on. 


The odd number of years in period 1980 — 1984 at the middle of the following table 
has 7 = (1980 + 1984)/2 = 1982 as the middle point. The code for the year t is x = t -?. 


For t =.1982, we have x = t — f = 1982-1982 = 0. Thus, the coded year is zero at f . For 
t = 1980, we have x = 1980-1982 = —2. Actually, the only computation we need is that for 
f . Thus after entering x = 0 at the middle of an odd number of years, we assign — 1, —2, 

and so on for the years before the middle year, and 1, 2, +++ and so on for the years after the 
_ middle year as shown in the following table. 


The even number of years in period 1980 — 1985 at the right of the following, table has 
= (1980 + 1985)/2 = 1982.5 as the middle point. So x = 0 half way between the years 


aes and 1983. For f = 1982,wehave x = t—-—f = 1982 — 1982.5 = — 0.5. Then after 
Ba x = 0-at the middle of an even number of years, we assign — 0.5, — 1.5, — 2.5, 
++ and so on for the years before the middle year and. 0. Bp 1.5, 2.5, *** and so on for the 


. ears, after the middle year as shown in the following table. 


_ However, to avoid decimals in the coded years we can take the unit of measurement as 
1/2 yeu. Then after considering x = 0 at the middle of an even number of years, we assign 
=) 57520) -5, - * and so on for the years before the middle year and_ I, 3, 5, *-- andsoon 
for the years after the middle year as shown in the following table, 


Analysis of Time Series | : 245 


Table: Coded Year Number 





Origin at beginning. Odd number of years. Even number of years. 
The starting year is The middle year. is - x = Q is at the centre of 
coded x = 0 coded x =Q | two middle years 
Coded year Coded year Coded year Coded year 
in units as in units as inunitsas inunits as 
Year one year | one year Year one year 1/2 year 
= vane" tert 
f x x=t-t t X=f(-t x= 
1/2 
1980 0 1980 — 2.5 —5 
1981 | 198] -1.5 -3 
1982 2 1982 ~ -0.5 —I 
1983 3 1983 0.5 
1984 4 1984 . 1.5 3 
1985 2:5 5 





16.4 ESTIMATION OF SECULAR TREND 


It has been earlier stated that one component force that determine the value of the 
variable at any period of time is the secular trend. The secular trend is measured for the purpose 
of prediction or projection into the future. The secular trend can be represented either by a straight 
line or by some type of smooth curve. It is measured by the following methods: __ 


(i) Method of free hand curve 
(ii) Method of semi-averages 
(iii) Method of moving averages 
(iv) Method of least squares 


16.4.1 Method of free hand curve. The secular trend is measured by the method of free hand _ 


curve in the following steps. 


(i) | Using an appropriate scale, take the time periods along x-axis, as an independent 
variable. 


(ii) | Using an appropriate scale, plot the points for observed values of the variable Y asa 


dependent variable against the given time periods. 
(iii) Join these plotted points by line segments to get a historigram. 


(iv) Keeping in view the up and down fluctuations of the graph, draw a free hand smooth 
curve or a straight line through the historigram i in a way such that it indicates the general 
trend of the time series. 


(v) Instead of locating the line simply by eye looking at the graph, the average y of orginal 
values may be used as the trend value y’ at the middle of the time period. Plot this 
average in the middle of the time period and the required trend line or curve should be 
drawn through this point, as it is a reasonable condition that y should be ‘equal to y 


(vi) Read off the trend values for different time peciods from this trend line or curve. 


. 





| ——- | ii 


If a straight line is used for locating the trend, then it becomes easy to estimate the 
rate of change ( slope of the line 6) by measuring the difference y. ,, — y‘ of the trend values 
for any two consecutive time periods x and x + 1. Symbolically it is expressed as 
b = y.,, — y,. Then the equation of the trend line is summarised in the slope intercept from as 


¥ 


y’ = a + bx with ongin at any time period, so that, a = trend value for the origin. 
If the historigram indicates a non-linear trend, then in such situations it is generally 
preferred to use a curve instead of a straight line to show the secular trend. 
Merits. 
(i) The free hand curve method is a simple, easy and quick method for measuring secular 
trend. 
(ii) The trend line or curve smoothes out seasonal variations. 
(i) A good fitted trend line or curve can give a close approximation to the trend based on a 
mathematical model. 
Demerits. 
(i) tis a rough and crude method. It is greatly affected by the personal bias, i. e., different 
persons may fit different trends to the same data. ; 
(ii) It requires too much practice to get a good fit. 
(iii) The free hand curve method is subject to personal bias, so it is unable to give reliable 
estimates. 
Example 16.2 The following time series shows the number of road accidents in Punjab for the 


year 1972 to 1978. “ 
19721973 1974 1975 1976 1977 1978 
[Number of accidents | 2493 2638 2699 3038 3745 4079 4688 


(i) Obtain the historigram showing the number of road accidents and a free hand trend line 
by drawing a straight line. 


(ai) Find the trend values for this time series. 
Solution. (¥) | 
Year Value Total Mean Trend 











y . value 
1972 2493 2200 ‘ 
1973 2638 2550 3 
1974 2699 2950 3 


1975 3038 23380 3340 3340 





1976 3745 3650 
1977 4079 4050 ae auc Seu 2% 
1978 4688 “4400 so | 


Fig. 16.4 Number of road accidents 





(ii) Reading off the trend line, we get the trend values. 


16.4.2 Method of Semi-averages. The secular trend is measured by the method of semi- 
averages in the following steps. 


(i) Divide the observed values of the time series into two equal periods. If the number of 
observed values is odd then it is advisable either to omit the middle value altogether or 
to include the middle value in egch half. 


(ii) Take the average of each part and place these average values against the mid points of 
the two parts, and the average yalue of each part should be considered equal to the 
average value of its respective trend values. 


(iii) © Plot the semi-averages on the graph of the original values. 


(iv) Draw the required trend line through these two plotted points, and extend it to cover the 
whole period. 


(v) With two points located on the straight line, it is simple to compute the slope and 
y-intercept of the line. This slope gives the estimate of the rate of change of values. Now, 
the trend values are found either by reading off the semi-average trend line or by the 
estimated straight line as explained below 


Semi-average Trend Line. Let y; and y, be the semi averages placed against the times x, and 


x, , and the estimated straight line (in slope-intercept from) y’ = 4+ x is to pass through the ~ 


points (x,, y; ) and (x,, y,). The constants a ( y-intercept) and b (slope of the line) can be 
easily determined. The equation of the line passing through the points (x,, y;) and (x,, y;) 
can be written as 


y ay) ee ae) 
ad 
y—-y = b(x-X,) where pees 2 eer Ore 
y = (y, — bx,) + bx 
y’ - at bx | | where a = y- bx, 


If the number of time units in the observed time series is even, then the following formula may be 
used to find the slope of the trend line. 


pisbsiier) 820 ES 
Sn 2 in] 2 n/ 25) 
1_( S=5) _ (%-5) 
n/a Geen 2a n?/ 4 


_ 4052-51) 
n2 











tl 





j 


ee 
————EOV737HFO ae 


sass _ Statistics — Part 1 





sum of y-values for the first half of the period. 


where S$ j 
S; = sumof y-values for the second half of the period. 


number of time units covered by the time series. 


= 
il 


Merits. 
(i) |The method of semi-averages is simple, easy and quick. 
(ii) It gives an objective result. 
(iii) _ It smoothes out seasonal variations. 


(iv) It gives better approximation to the trend because it is based on a mathematical model as 
compared to free hand method. 


Demerits. 


(i) The arithmetic mean, which is used to average the two halves of the observed values, is 
highly affected by extreme values. 


(if) | This method can only be applied if the trend is linear or approximately linear. 
(iii) This method is not appropriate if the trend is not linear. 
Example 16.3 The following table shows the property damaged by road accidents in Punjab 


for the years 1973 to 1979. | 
1973. 1974 1975 1976 1977 1978 1979 


Property damaged | 201 238 392 507 484 649 742 


_ (Obtain the semi-averages trend line. 
(ii) Find out the trend values. 
Solution. (@) Let x = ¢ — 1973. 
















Year Propertydamaged Semi-total Semi-average Coded year Trend value 
t y x = t-—1973 y' = 190 + 87x 

1973 201 0 190 + 87(0) = 190 
1974 238 831 277 il 190 + 87(1) = 277 
1975 392 2 190 + 87(2) = 364 
1976 507 3 190 + 87 (3) = 451 
1977 484 4 190 + 87(4) = 538 
1978 649 1875. 625 5 190 + 87(5) = 625 
1979 742 6 190-+ 87(6) = 712 


EEE E————EEE EEE 
The semi averages trend line is 


y =a+bx 


. 
sf. j- >= SEE TRU See co ec ste 


Analysis of Time Series 249 


“ ee 






Taking the origin at 1973, we have y4 
700 
yi = aA HT 41 = l 
600 
='/ G25; % = 5 B sop 
SASS E 
a Xx, — * a ?/ *—Trend line 
8.300 : 
3 625 — 277 = 837 en 
5-1 
100 
a= y,- bx 
= — §7 = 190 OPE 2) a ee 
277 WO). ee 8 Bok 
Year 


| Fig. 16.5 Property damaged 
The semi-averages trend line is 
y’ = 190 + 87x with origin at 1973 
(ii) For different values of x, the trend values are obtained as shown in the table. 
Example 16.4 The following table gives the number of books (in 000’s) sold at a book stall 
for the year 1973 to 1981. 


‘lvear | 1973 1974 1975 1976 1977 1978 1979 1980 1981 | 


Number of 

books (000's) | ; 
(i) ‘Find the equation of the semi-averages trend line. 
(ii) Compute out the trend values. 


(iii) Estimate the number of books sold for the year 1982. 
Solution. ® Let x = ¢ — 1973. 


Year No.ofbooks Semi-total Semi-average Coded year Trend value 





t y x = t-1973 y’ = 39.5 — 3x 
1973 42 0 39.5 — 3(0) = 39.5 
1974 38 1 39.5 -—3(1) = 365 
1975 35 140 39 2 30.5 — 3(2) = 33.5 
1976 25 3 39.5 —3(3) = 30.5 
1977 32 4 39.5-3(4) = 27.5 
1978 24 5 39.5 —3(5) = 245 
1979 20 ues Bh 6 39.5 — 3(6) = 21.5 
1980 19 7 39.5 — 3(7) = 18.5 
1981 17 8 39.5 —3(8) = 15.5 





The semi-averages trend line is 
a 


y =a+bx 


250 Statistics — Part Il 


Taking the ongin at 1973, we have 


y, = 35, x, = 15 and y, = 20, X» = 6.5, 
boc y2 Ji =, -20'=:35- =a 
X> — x 6.5 —1.5 
a= y,-— bx, = 35-(-3)(15) = 395 | 
y = 395 —3x with origin at 1973 
(ii) For different values of x, the trend values are obtained as shown in the table. 


(iii) For the year 1982, we have x = 1982 — 1973 = 9. Then 

y = 39.5 -3(9) = 125 
16.4.3 Method of Moving Averages. If the observed values of a variable Y are-y,, y,,°°-, 
y,, corresponding to the time periods ¢,, f,, *"*, f, respectively, then the k-period simple 
moving averages are defined as 


k k + 


aed aoe 

Tk iat Ae * Pi eh a 
1 k+2 1 fn 

Sa RIT) ud mgt 


where a,, G,, 43, °°~, 4,, is the sequence of “k-period simple moving averages. That is, the 


k-period simple moving averages are calculated by averaging first & observations and then 
repeating this process of averaging the k observations by dropping each time the first 
observation and including the nest one that has not been previously included. This process is 
continued till the last & observations have been averaged. For example, the 3-period simple 
moving averages are given as 


3 
1 l 
a, = 3 + yo + y3) = 5 a 
l lx 
a, = (2 +93 +%) = > 2 yi 
Ace 1 ¥ 
Neha 3 0% + at Ys) = ze i 
and so on. Each of these simple moving average of the sequence a,, a,, a,, °° is placed 


against the middle of each successive group. For practical purposes the k-period moving 
successive totals $,, S,, 5, «~~ are obtained by the following relations 
k 


~ %i 


5, 


Sy = Si + Mai - 
= 5S. + Yer2 — Y2 


oy 
w 
I 


= = 


138° eS ————————————— re 


and so on, The k-period simple moving averages are obtained by dividing these k-period moving 


successive totals S,, 5,, 5; -*- by & as given in the following relations. 
each. 
k 
Gs an re Y2 


and so on. Each moving average should be placed against the middle of its time period. Then it is 
obvious that 

(i) When k is odd, the sequence a,, a,, a,, *** of simple moving averages will 
correspond directly to the observed values in the time series. 

(ii) When k is even, the sequence a,, a,, a,, ‘** Of simple moving averages will not 
correspond directly to the observed values in the time series and will be placed in the 
middle of two time periods. It is then sometimes necessary to centralize these averages 
so that they should correspond to the observed values in the time series. For 
centralization, further 2-period moving averages of the former k-period moving 
averages are computed which are called k-period centred moving averages. 


Smoothing of a Time Series. The smoothing of a time series is a process of eliminating the 
unwanted fluctuations in a time series. The moving averages tend to reduce the variation present 
among the observed values of a time series, so they are used to eliminate the unwanted 
fluctuations. Thus the moving averages may be used in smoothing of a time series. They 
eliminate the effect of periodic fluctuations if an appropriate period moving averages are 
calculated. For this purpose the period of the moving average is chosen such that it should be 
equal to the period of at least one cycle. The secular trend is measured by taking the following 
steps. 


(i) Find the moving averages of an appropriate period. 


(ti) Plot the points representing these moving averages on the graph of the observed time 
series and join these points by the line segments. 


(iii) The graph of the moving averages indicates the secular trend by eliminating the periodic 


fluctuations 
The period of moving averages should be decided in the light of the periodicity of a time 
series. Because only the moving averages,,calculated by using the time period which 
approximately coincides with the periodicity of the time series, would eliminate, nearly 
completely, all its regular fluctuations and show a trend. 
Merits. 
(i) |The method of moving averages is easy and simple. 


(ii) The moving averages of an appropriate period eliminate the ee fluctuations, so it 
may be used to eliminate cyclical and seasonal fluctuations. 


Demerits. 
({) The method of moving averages does not give the trend values at the beginning and at 
the end of the original time series. 


‘. : 
el, ee 


252 | Statistics — Part Ul 


(ii) The moving averages are highly affected by the extreme observations however the affect 
may be reduced by using the geometric mean as average. 


(iii) The method of moving averages does not provide a mathematical equation for the trend, 
therefore, the forecasting is only subjective. 


(iv) The selection of inappropriate period of moving averages may generate the cycles which 
‘are not present in the observed time series. 


Example 16.5 The following table shows the production of silver utensils (in thousands) ata 
certain factory in Gujranwala. 





Year Utensils 
mone (000). - 
1970 170.0 | = Trend li ao 
| on — ey oo’ 
1971 154.8 yt w — Historigram 
Ss 
S 120 
1972 156.5 2 
é 
1973 158.9 5 
1974 140.3 
1975 154.2 
0 — ™ aa) " ™ 
1976 160.7 Sy SS eS ise Se 
. Year 
1977 178.3 


Fig. 16.6 Production of silver utensils 


(it) Calculate 3 year simple moving averages for the following time series. 
(ii) Also plot actual data and moving averages on a graph. 
Solution. 
? Production 3-year 3-year | 
y moving total moving average 






Example 16.6 The following table shows the food grain price index number of quarters for the 
years 1962 and 1963. a se “wha tt 








Quarter I Quarter IT Quarter II Quarter [V 


293 97 96 93 
97 102 106 ve E98 





Analysis of Time Series zak 
ee a eee 


Calculate four quarter moving average centred. 


Solution. The four quarter centred moving averages are obtained as under: 


(1) (2) (3) (4) (5) (6) (7) 





Year Quarter Price index 4-quarter 4-quarter . 2-quarter 4-quarter centred 
numbery movingtotal moving average movingtotal moving average 
(4)+4 ot (5 (6) +2 
1962 I 93 | 
II 97 
37 94.75 
Ill 96 se 95.75 190.50 95.25 
IV 93 388 vi 00 192.75 96.38 
1963 I 97 398 9 50 196.50, 98.25 
2 200.25 [ 
: e 403 100.75 mae 
Il 106 
IV 98 


Alternately, for the sake of convince, the four quarter centred moving averaeee may be 
calculated as shown in the table given below: 








(1) (2) (3) (4) (5) (6) 
Year Quarter Price index 4-quarter 4-quarter 4-quarter 
number moving centred centred 
y total moving - moving average 
total | (5) +8 
1962 | 93 
II 97 
‘379 
il 96 384 762 95.25 
IV 93 398 771 96.38 
1963 I “eee 398 786 98.25 
Il 102 ’ 403 . 801 100.12 
II 106 
IV 98 





16.4.4 Method of Least Squares. For situations in which it is desirable to have a 
mathematical equation to describe the secular trend of a time series, the most commonly used 
method is to fit a straight line = a + bx, a second degree parabola y = a + bx + cx’, 
etc., where y is the value of a time series variable, x representing the time and all others are 
‘constants. For determining the values of the constants appearing in such an equation, the most 
widely used method is the method of least squares, because it is a practical method that provides 
best fit according to a reasonable criterion. The principle of least squares says that “the sum of 
squares of the deviations of the observed values from the corresponding emacs values should 
be least’. 


Among all the trend lines approximating a given time series data, the trend line is called 
a least squares fit for which the sum of the squares of the deviations of the observed values from 


254 _ Statistics — Part Il 


their corresponding expected values is the least. The method of least squares consists of 
minimizing the sum of the squares of these deviations. To avoid the personal bias in measuring 
the secular trend this method is used to find a trend line approximating a given time series. 
Secular Trend — Linear. It is useful to describe the trend in a time series where the amount of 
change is constant per unit time. 

Let (x,, y,), (4%), ¥2), °°", (%,,¥y,,) be the n pairs of observed sample values of a 
time series variable y, with x representing the coded time value. We can plot these nm points on 
a graph. Because of the fact that y,, y.,°**., y,, are observed values of a time series variable, 
these points will not necessarily lie on a straight line. Let us suppose that we want to fit a straight 
line expressed in slope-intercept form as 

y =a+t+bx 
This line will be called the least squares line if it makes 2 ( y - a - 6x)’ minimum. The 
method of least squares yields the following normal equations. 





Ly = nat+byx, Dxy = adx+ br 
The normal equations give the values of a and b as 
bp = B&xy— (xy) Penne > Daal Pe 5 bE 
n> x? —(Dx)* n 
However, if > x = 0, then the usual normal equations reduce to 
Ly.= na, Dy =) ba x 
Therefore, the values of a and Bb also reduce to 
n yx- 


The trend values y are computed from the least squares line y = a + bx by substituting the 
values of x corresponding to the different time periods. The secular trend can be indicated on a 
graph by plotting these estimated vales against their respective time periods. 


Properties: 


(t) The least squares line always passes through the point (x, y) called the centre of 
gravity of the data. 


(ii) | The sum of the deviations >( y — y ) of the observed values y from their corresponding 
expected values y is zero, i. é., 


X(y -¥) = 0 = Ly = DF 
(iii) The sum of squares of the deviations %(y — y)* measures how well the trend line fits 
the data. A smaller >( y — ¥)* means the better fit. 


Example 16.7 The following table shows the production of steel in a steel mill for the time 
period 1977 to 1983. 


[197719781979 1980 1981 1982 1983 
1127 10.1 13.0 13.2 126 142 13.7. 





Analysis of Time Series ve 255 


Find the linear trend by the method of least squares by taking the origin: 
(i) at the beginning period of the time period, 
(ii) at the middle of the time period 1977 — 83. 
Calculate the trend values in both cases. 
Solution. (i) Taking the origin at the beginning period, 1977 (i. e., July 1, 1977 ), we have 





x =t— 1977. 
Year Production Coded year Trend value 
f y? x = t-1977 xy x2 y = 11.628 + 0.386x 
1977 12.7 0 0 0 11.628 + 0.386(0) = 11.628 
1978 10.1 l 10.1 ] 11.628 + 0.386(1) = 12.014 
1979 13.0 2 26.0 4 11.628 + 0.386(2) = 12.400 
1980 13.2 3 39.6 9 11.628 + 0.386(3) = 12.786 
1981 12.6 4 50.4 16 11.628 + 0.386(4) = 13.172 
1982 14.2 5 71.0 25 11.628 + 0.386(5) = 13.558 
1983 13.7 = i6 82.2 36 11,628 + 0.386(6) = 13.944 
Total 89.5 21 279.3 91 . 
The least squares trend line is 
y =atbx 


The least squares estimates a and 6 are | 
ndxy—-(Lx)(Ly) 7 (279.3) — (21)(89.5) 


b = SOOO = 0.386 
py etd OYE. fe 7(91) — (21)? 
ae Ly—bdx r 89.5 — (0.386)(21) _ 11.628 
n 7 
The best fitted line is 
y = 11.628 + 0.386 x with origin at 1977 


For different values of x, the trend values are obtained as shown in the table. 


(ii) We have f = (1977+ 1983 )/2 = 1980. Taking the origin at the middle of the time 
period at 1980 (i.e., July 1, 1980), wehave x =.t —?% = t — 1980. 


Year Production Coded year Trend values 
t y ‘x = t-—1980 xy x2 y = 12.786 + 0.386 x - 

1977 2 | —3 38.1 ed 12.786 + 0.386 (—3) = 11.628 
1978 10.1 =) -20.2 4 12.786 + 0.386 (—2) = 12.014: 
1979 13.0 = =13:0 eel 12.786 + 0.386 (—1) = 12.400 
1980 [3:00 0 0 0 12.786 + 0.386(0) =.12.786 
1981 12.6 1 12.6 1 12.786 + 0.386(1) = 13.172 
1982 14.2 2 28.4 4 12.786 + 0.386(2) = 13.558 
1983 13.7 3 41.1 9 12.786 + 0.386(3) = 13.944 
Total 89.5 - 0 10.8 28 : 


ST ORAL 2 





256 2 , | Statistics — Part Tl 





The least squares trend line is y =at bx 
Since > x = 0, the least squares estimates a and b are 
a2: 2y = 389.9 = 12.786 
n 7 
Lxy 10.8 | 
= = — = 0.386 
° ye 28 
The best fitted line is 
y = 12.786 + 0.386 x with origin at 1980 


For different values of x, the trend values are obtained as shown in the table. 


Example 16.8 The consumer price index for medical care (medical cost) are given in the 
following table for the years 1980 to 1987. The base period 1979 is assigned the value 100 
which actually means 100 %. 


1980 
106.0 
Find a least squares linear trend, 
(i) by taking the origin at the middle of the time period with unit of measurement as 1 year 
(ii) with unit of measurement as 1/2 year. 
Compute the trend values in both cases. 
Solution. (i) We have, ¢ = (1980+ 1987)/2 = 1983.5. Taking the origin at the middle of 
the years 1983 and 1984 (i. e., January 1, 1984 ), with unit of measurement as 1 year, we have 
x =t-f = t— 19835. 

































1982 1983 1984 
117.2 121.3 125.2 


1981 
111.1 


1985 
128.0 


1986 1987 
132.6 138.0 





















- Year Production Coded year Trend value 
t y x = t-— 1983.5 xy x2 | y = 122.42 + 4.38 x 
1980 106.0 =AVG 371.00 12.25 122.42 + 4.38 (—3.5) = 107.09 
1981 ‘111.1 2.5 277.15 (625 122.42 + 4.38 (-2.5) = 111.47 
1982 117.2 —1.5 175.80 2.25 122.42 + 4.38(—-1.5) = 115.85 
1983. 121.3 ~0.5 60.65 0.25 122.42 + 4.38(-0.5) = 120.23 
1984 1252  . 0.5 6260 025 122.42 + 4.38(0.5) = 12461 
1985 128.0 as 19200 2.25 122.42 + 4.38(1.5) = 128.99 
1986 1326 © 2.5 33150 6.26 122.42 + 4.38(2.5) = 133.37 
1987 138.0 3.5 483.00 12.25 122.42 + 4.38(3.5) = 137.75 
Total 979.4 0 183.9 42 | 
_ The least squares trendline is 
y=atbx . 
Since x = 0, the least squares estimates a and 2 are 
a= ay = STIS: = 122.42 
oH Crt 8 


bw exy, _ 1183.9 | 
Ex 42 


= 438 





iy eee ee 


a 





a 


The best fitted line is 


y = 122.42 + 4.38x with origin at middle of the years 1983 and 
1984 and unit of measurement as | year 


For different values of x, the trend values are obtained as shown in the table. 


(ti) 


We have, ¢ = (1980 + 1987 )/2 = 1983.5. Taking the origin at the middle of the 


years 1983 and 1984 (i. e., January 1, 1984), with unit of measurement as 1/2 year, we have 


t—t t — 1983.5 








Sees Wess ag 117 
Year Production Coded year Trend value 
| t — 1983.5 | 
t y x= Gee xy x? y = 122.42 + 2.19x 
1/2 
1980 106.0 —7 — 742.0 49 122.42 + 2.19 (—7) = 107.09 
1981 111.1 —5 — 555.5 25 122.42 + 2.19 (—5) = 111.47 
1982 117.2 —3 — 351.6 9 122.42 + 2.19 (—3) = 115.85 
1983 121.3 —] — 121.3 l 122.42 + 2.19(—1) = 120.23 
1984 125.2 l 125.2 l 122.42 + 2.19(1) = 124.61 
1985 128.0 3 384.0 9 122.42 + 2.19(3) = 128.99 
1986 132.6 5 663.0 25 122.42 + 2.19(5) = 133.37 
1987 138.0 7 966.0 49 122.42 + 2.19(-7) = 137.75 
Total 979.4 0 367.8 =: 168 in 
The least squares trend line is y=artbx 
Since >) x = 0, the least squares estimates a and b are 
qe ee go FPS! C909 
n 8 
Bry _ 3678 _ 19 
yx 168 
The best fitted line is 
y = 122.42 + 2.19x ‘ with origin at middle of the years 1983 and 


_ 1984 and unit of measurement as 1/2 year 


For different values of x, the trend values are obtained as shown in the table. 

Example 16.9 The consumer price index numbers y for medical care (medical cost) were 
given for the years _1980 — 1987. The base period 1979 was assigned the value 100. 
The least squares linear: trend, with x measured from the middle of 1983 and 1984 
(i. e., January 1, 1984 ), and unit of measurement as \/ 2 year is 


(7) 
(ii) 
(iii) 


y = 122.42 + 2.19x 
Compute the trend values. 
Predict the consumer price index number for the year 1988. 


In which year can we expect the index of medical cost to be double than that of 1979 
assuming the. present trends. 


258% | Statistics — Part I 


Solution, The least squares linear trend is 
¥ = 122.42 + 2.19x with origin at middle of the years 1983 and 
1984 and unit of measurement as 1/2 year 
Wehave ¢ = (1980 + 1987)/2 = 1983.5. Then 
t-tf _ t-—1983.5 





(on 2 | 
(i) For different values of x, the trend values are shown in the table. 


Coded year Trend value 
1 1983.5 : 
1/2 ; 


= 122.42 + 2.19x 


122.42 + 2.19(—7) = 107.09 
122.42 + 2.19(—5) = 111.47 
122.42 + 2.19 (—3) = 115.85 
122.42 + 2.19(-1) = 120.23 
122.42 + 2.19(1) 124.61 
122.42 + 2.19(3) 128.99 
122.42 + 2.19(5) = 133.37 
122.42 + 2.19(7) 137.75 
(ii). For ¢ = 1988, wehave x = LEASED = Bier 1983:9) = 9 
1/2 1/2 
The estimated consumer price index for the year 1988 is 
y = 122.42 + 2.19(9) = 142.13 
(ui) Price index for 1979 is 100, Expected price index for t is 200. 
Now - 200 = 122.42 + 2.19x = x=) 35:4 





- ¢~ 1983.5 
But 5, dE 
. e 1/2 
35.4 = age = 17.7 = t— 19835 = ¢ = 2001 


ipa of the Origin. While shifting the origin of a given trend line & units from the previous 
origin, we substitute x + k or x — k in the given trend line; for x depending upon whether the 
new origin is forward or backward of the prévious origin and then find the trend line with new 
origin. ay 7 Suites : | 
Thus in shifting the origin of a given linear trend the only change that take place is the — 
change in the y-intercept. If we are to shift the origin k units forward, then to obtain the value of 
new y-intercept the previous y-intercept a is to be increased by k times the slope 5 and if we- 
are to shift the origin k units backward, then to obtain the value of new y-intercept the previous 
y-intercept a is to be decreased by k times the slope b. That‘is, the value of the y-intercept of 
the new trend line would be the trend value at the new origin based on the previous trend line. 





ar = i a 


ee. ae 


Analysis of Time Series - : 259 


Thus, if with previous origin the trend line is y. = a + bx, then with new origin k units from 
the previous origin, the trend line is 
y=zatbhb(xtk) = (at bk) + bx. . 
Example 16.10 Suppose that the linear trend equation is y = 110 + 1.5 x, with origin at 
1980 and unit of measurement for x is one year. Shift the origin at 1985. 
Solution. The linear trend equation is 
= 110 + 15x with origin at the year 1980 
For Cree the origin at 1985, replace x by (x + 5) 
y 110 + 1.5(x + >), 
110 + 15x+75_ 
117.5 + 15x with origin at the year 1985 


Secular Trend — Nonlinear: Many times a straight line will not describe accurately the long: 
term movement of a time series. In such situations by a careful look at the graph of a time senes 
we might detect some curvature and decide to fit a curve instead of a straight line. 

Second degree curve ( Parabola ). This curve is useful to describe the trend in a time Series 
where change in the amount of change is constant per unit time. The equation of the quadratic 
( parabolic ) trend is 


IL 


Il 


y=atbx+cx’ 


The method of least squares given the normal equations as 
Ly = nat+byx+cLx? 
Sxy = adx+bdx*27+eLx° 
Dx? y = adx? + bYx3 + cLx* 
However, if 5x = 0 = > x°, then the usual normal equations reduce to 
Sy = nat+cyx 
Sxy = bd x? 
Yx2y = adLx? + cD x* 
which give the values of a, b and c as 
_ nox? y-(2x*)Cy) 
SO ye (x2)? 


—_ 2 
ee See 
it 
dD = 2xy 
Xs 


Example 16.11 Given the following time series. 





(i) Fit a second-degree curve ( parabola ) taking the origin at 1938. 





260 


(ii) Find the trend values. 


(iii) What would have been the equation of parabola if origin were at 1933. 
Solution. (() We have ¢ = (1931+ 1945)/2 = 1938. Let x =f - t=t- 1938 





Year Price index Coded year Trend value 
t y x = t- 1938 = x xy x? y y 
1931 96 =F 49 2401-672 4704 100.3. 
1933 87 5 25 625 -—435 2175 83.0 
1935 9] —3 9 Site 273 819 81.8 
1937 102 =] l 1 -102 102 96.7 
1939 108 1 1 1 108 108 127.7 
1941 139 3 9 81 417 1251 (174.7 
1943 307 5 25 625 1535. 7675 237.8 
1945 289 7 49 2401 2023 14161 317.0 
Sum 1219 0 168 6216 2601 30995 1219.0 

The Guadratic trend is 


=~ at+bhbx+cx 


2 


——E——ESSSs see 


Statistics — Part I 





Since > x = 0 = > x?, the least squares estimates a, b and c are 


es n2xiy — (Zx7)(Sy) _ 830995) — (168)(1219) _ 49, 


DIE SPS 8(6216) — (168)? 











ee ~ (201)6 : 
gS =e Ex? 1219 ~ (2.01)(168) _ 449, 
n 8 
b = xy = 2601 = 15.48 
Sx2 - 168 ) 
The best fitted curve is : 
yp. = 1102 + 15.48 x + 2:01 x? with origin at the year 1938 
(ii) ° For different values of x, the trend values are obtained as shown in the table. 


(iii) For shifting the origin at 1933, replace x by (x - 5) 
| 5 = 110.2 + 15.48(x-—5) + 2.01(x- 5) 
110.2 + 15.48(x — 5) + 2.01 (x? — 10x + 25) 
110.2 + 15.48 x —77.4 + 2.01 x? — 20.1 x + 50.25 
$3.05 — 4.62 x + 2.01 x? with origin at the year 1933 
Example 16.12 Given the following time series. a. : 
1931 1933 1935 1937 1939 1941 1943 1945 


Il: 





() Fit a straight line ako the origin at 1938. 3 
@) Fita second-degree curve ( parabola ) ans the origin at. 1938. 
(iii) Which i is ile DEHEL Wied frend. ; 





Analysis of Time Series __ — 261 


Solution. (() We have ¢ = (1931+ 1945)/2 = 1938.Letx=1-f =12 = 1938 
Year Priceindex Coded year : 


t y x = t-1938 a SS 5X Vin Xeey y? 
1931 96 =} 49 2401 -672 7404 9216. 
-1933 87 =5 25 625. -435 2175 7569 
1935 9] =3 9 81273819 8281 
1937 102 =) Pelee — 102 102 10404 ° 
1939 108 1 1 108 108 11664 
1941 139 3 9 81 417 125%  ~—«19321 
1943 307 5 25 625 1535 © 7675 94249 
1945-289 7 49 .-2401 2023: 14161 $3521 
Sum 1219 0 168 6216 2601 30995 244225 
The linear trend is , 
y =a+tbx 


Since > x = 0, the least squares estimates a and b are 





a = ZY = 1219 _ 15038 
e att 8 
, 2601 : : 
b= —* = —_ -=15.48 : -- 


=x 168 7 = 
The best fitted line is | 


j = 152,38 + 15.48x with origin at the year 1938 
The sum of squares of residuals is 
De? = Yy?-aby- bE xy | 
= 244225 — 152.38 (1219) — 15.48( 2601) = 18210.3 
The quadratic trend is | 
y = at+bx+cx? 
Since > x = 0 = 2 x’, the least squares estimates a, b and c are 


_ n&xty~(Zx?)(Zy) _ 8(30995) - (1681219) _ 4 4, 


n> x4 - (x7)? 8(6216) — (168)? 
e os 2 | x | a 
a we SHOR xt 1219 = (2.01) 068), 

n 8 
= xy = eel = 15.48 
¥ x? 168 
The best fitted curve is 
- § = 110.2 + 15.48x+201x? —_with origin at the year 1938 





262 “i | | in As - Statistics — Part Il 





The sum of squares of residuals is 
De? = LDy*-ady—bExy—cLlx’y 
244225 -— 110.2( 1219) — 15.48 ( 2601 ) — 2.01 ( 30995 ) 


= 7327.77 


(iii) Since the sum of squares of the residuals for quadratic trend is smaller than the sum of 
squares of the residuals for linear trend, therefore quadratic trend is better fitting trend. 


(i) The method of least squares gives the most satisfactory measurement of the secular trend 
in a time series, when the distribution of the deviations is approximately normal. 
(ti) |The least squares estimates are unbiased estimates of the parameters. 


(iii) The superiority of this method lies in that the computations needed to determine the 
linear, exponential or quadratic trend have been reduced to formulae. 


Demerits. 
(i) - The method of least squares gives too much weight to extremely large deviations form 
the trend. 
(ii) The least squares line is the best only for the period to which it has reference. 
(iii) The elimination or addition for a few more time periods may change its position. 
(iv) The only real criterion for the selection of a method of measuring trend is the judgement 
. as to how well the trend line follows the general movement of the time series. 
Uses of Secular Trend. 


(i) The secular trend may be used either in determining how a time series has grown in the 
__ past or in making a forecast. 
(ti) The trend line is used to adjust a series to eliminate the effect of the secular trend in 
order to isolate non-trend fluctuations. 





| | Exercise 16.1 
1. (a) What j is meant by a time series? What are different movements that may be present in a 
time series.? Describe each of them carefully. 
(6) Explain the difference between histogram and historigram. 
(c) Describe the following terms: | 
(ij) Secular trend. . (ii) Seasonal variations 
(iii) Cyclical fluctuations. PAY (iv) Irregular movements. 


2. (a) Describe various methods of measuring secular trend in a time series. Discuss the merits 
and demerits of the methods of smoothing the data. 


(b) Plot the original time series to obtain a historigram. 
Year 1982 1983 1984 1985 1986 1987 1988 1989 1990. 1991 1992 
Value — 50.0 365 430 445 389 38.1 32.6 38.7 41.7 41.1 33.8 
Draw a free-hand trend of the following data on the same graph paper: 





=“ ee to OUI 


as oi ll 


ee 


see mh fh chy me ee ee 


+ 


Analysis of Time Series 263° 


3. (a) 


(b) 


(c) 


4. (a) 


(6) 


5. (a) 


(6) 


6. (a) 


What do you understand by the method of semi-averages utilized for smoothing of a 
time series. Give an example? 

The following table shows the property damaged by road accidents in “Punjab for the 
years 1972 to 1982. 


Year 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 


Property | 213 203 238 392 507 441 649 473. 342 365 330 


damaged — 


Using the method of semi-averages, find the linear Tae 
(y’ = 270.2 + 20.2x with origin at 1972) 


The following table gives the number of books (in 000’s) sold at a book stall for the 
year 1970 to 1981. 


Year — |1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 


Number of | | 
books(000) 15, 918. 17) 42 938: 4.40) 9225.7, 20); 20 16 » 17 


Using semi-average method, find the trend line. Compute the trend values. 

(y’ = 32.01 — 1.47x with origin at 1970) 

What are moving averages? How is a time series smoothed by moving average method? 
Give an example. 


Draw a historigram of the following time series. Determine a trend line by a simple 
moving averages of 5-year from the following data: 


Year | 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 
"value | 102 108 130 140 158 180 196 210 220 230 
(127.6, 143.2, 160.8, 176.8, 192.8, 207.2) : 

Calculate 7-day moving averages for the following record of attendances: 


_ Week Sun. Mon Tues Wed Thurs Fri Sat. 
| panes 50 30 48 54 55 62 
2 28 52 41 42 50 41 meet 42 


Plot the given data and moving averages on the same graph. 

(46.14, 46.71, 47.00, 48.57, 47.71, 47.14, 45.14, 42.29 ) 

The following table shows the United States average monthly production of bituminous 
coal in millions of short tons for the years 1981-91. Construct  (#) yeas moving - 
averages (if) 4-year centred moving averages 


Year (1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 
Production | 50.0 36.5 43.0 445.389 381 387 326 411 417 338 
{ (i) 43.5, 40.7, 41.1, 40.0, 37.1, 37.6 38.6 37.3 (ii 42.1, 40.9, 40.6 38.6, 37.4, 
38.1 38.0} 

Compute 4-month centred moving averages from the following: 
Month — Jan Feb Mar _Apnil May June July Aug Sept Oct. 
Value 23 26 =. 28 30ee 3596137, MO.32 S348: 
(27.75, 29.88, 32.12, 33.50, 34.12, 34.88) | es a7 


(6) 


7. (a) 


(6) 


8. (a) 


(d) 


9. (a) 


Statistics — Part Il 


Find 4-quarter centred moving averages for the following data. 


Year Quarter | Quarter II © Quarter II Quarter IV 

1948 7 72 Se7es0 | 84 

1949 72 69 75 79 
1950 73 80 85 | 86 


Plot the original data and the trend values on a graph. 
(76.38, 76.12, 75.38, 74.38, 73.88, 75.38, 78.0, 80.12) 


Explain the method of least squares utilized for finding a secular trend in a time series. 
Given the following time series. 

Year 1968 1969 1970 1971 1972 1973 1974 1975 1976 
wis | 3 2 SSS a eee a 
Determine the linear trend using least squares method. by taking the origin at the 


beginning period of the time period. Estimate the value for the year 1978. 

(y = 3.11 + 1.17x  withorigin at 1968; 14.78 ) 

The following time series shows the number of road accidents in Punjab for the years 
1977 to 1987. 


Year 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 
Number of | 5493 2639 2669 3038 3745 4079 4683 48454505 4793 4728 
accidents f | aa 

(i) Use the method of least squares to fit a straight line taking the origin at the 

middle of the time period. 

(ii) Find the trend values for this time series. 

(iii) Estimate the number of road accidents in 1989. 
{@ y = 3837.91 + 271.37 x with origin at 1982; (ii) 2481.05, 2752.42, 


3023.79, 3295.16, 3566.54, 3837.91, 4109.28, 4380.65, 4652.02, 4923.39, 5194.76; 
(tii) 5737.50} - 


Fit a straight line § = a + bx from the following results, for the years 1985—95 


( both inclusive ). Find out the trend values of y as well. 


Ex=0, Ly = 4389, Ya? =110, Lxy = -844 
($ = 39.9 - 0.77x with origin at 1990; 43.75, 42.98, 42.21, 41.44, 40.67, 
39.90, 39.13, 38.36, 37.59, 36.82, 36.05) 


Fit a straight line to the following data taking the origin at the middle of the time period 

and unit of measurement as 1/ 2 year and find the trend values: 

Year | 1980 198119821983 1984. «1985 
Production (000) 105 12. ‘sin 10 14 16 

(> = 11.66 + 0.54. with origin at the middle of 1982 and 1983 and unit of 


~ measurement as 1/2 year; 8.96, 10.04, 11.12, 12,20, 13.28, 14.36 ) 


: —- ————_  — —_ 
=. me hee a =“ — = 


a Da wr’ 8 





Analysis of Time Series 265 


©) 


(c) 


10. (a) 


(b) 


11. (a) 


(0) 


ene e ene e eee eee e eee eee e eee eee cc YES 


Fit a straight line to following data. Plot on the same graph paper the actual and trend 


values. 
Year | 1970 1971 1972 1973 1974 1975 1976 _1977 
Value 12 15) len LS sees 20 22 26 30 


(9 = 2141.12 x with origin at January 1, 1974, 13.16, 15.49, 17.64, 19.88, 
22.12, 24.36, 26.60, 28.84 ) 
For the following time series, determine the trend by using the method of 

(i) semi-average, 

(ii) 3-year moving average, 

(iii) least-squares for fitting a straight line. 
Year | 1968 1969 1970 1971 1972 1973 1974 1975 1976 
Value | 2 4 6 g 7 Gis 8 10) ed Oe 
Which of the trend do you prefer, and why? 


{() » = 3.8+0.8x with origin at 1968, (i) 4.0, 6.0, 7.0, 7.0, 7.0, 8.0, 10.0; 
(iii) ) = 7+x with origin at 1972; Least squares trend } 


Fit a second degree curve to the following time series and find the trend values. 


Year 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989. 1990 
Production | 23. 2 314 39.8 50.2 629 76.0 92.0 105.7 122. 8 131.7 151.1 
($ = 76.64 + 13.0x + 0.3974 x? with origin at 1985 and unit of x as 1 year, 
21.6, 31.0, 41.2 52.2, 64.0, 76.6, 90.0, 104.2, 119.2, 135.0, 151.6) 

Fit a quadratic curve to the following time series. 

Year | 1924 1927 1930 1933 1936 1939 1942 

Index of coal price i897, «:142,—=—«d133.s129's136 = 169279 


Use your results to estimate the values of the index for 1935. 

(y = 119.9 + 11.89 x + 11.99 x? with origin at 1933 and unitof x as 3 years; 
y = 1530169) 

Fit a second degree curve to the following time series | 

Year | 1980 1981 19821983 :1984__1985__1986 _1987 
Quantum Index 100 87 96 102 139 210 289 307 


(py = 131.8 + 16.89 x + 1.64 x? with origin at the January 1, 1984 and unitof x 


as 1/2 year) 

Fit a quadratic curve (parabola) to the following data. Compute the trend: values. 

_Year = | 1931 1933 1935 1937 1939" 1941 1943 1945 _ 
‘Priceindex | 96 87 91 102 108 139 _—307__—289. 


($ = 110.16 + 15.48x + 2.01 fe With ouginat 1938 and unit of x as one year; 
100.3, 83.0, 81.8, 96.7, 127.7, 174.7, 237.7, 317.0) 





266 Statistics — Part Il 


12. (a) The following are the annual profits in thousands of rupees in certain business: 
Year 1977 1978 1979 1980 1 1982 1983 
Profit | | 88 LU LenS 91 113 120 132 
(i) Fit a linear trend by the method of least-squares and make an estimate of the 
profits in 1985. 
(ii) | Fit a parabolic trend. 
(iii) Determine which is the better fitting trend. 
{ (i) y = 107.14 + 6.36 x with origin at 1980 and unitof x as 1 year; 
y = 139.04, (ii) § = 103.24 + 6.36x + 0.976 x* with origin at 1980 and unit 
of x as | year } 
(6) Fit a quadratic trend from the following results, for the years 1985—95 (both inclusive). 
Le La = 0; exe = 10, x = 1958, 
Ly = 410, “xy = 601, Dx?2y = 4587 
Find out the trend values of y as well. Estimate the trend value for the year 1996. 
(y = 31.6 + 5.46x.+ 0.568 x* with origin at 1990 and unit of x as 1 year; 
18.50, 18.85, 20.33, 22.95, 26.71, 31.60, 37.63, 44.79, 53.09, 62.53, 73.10; 84.81) 
13. (a) Suppose that the linear trend equation is y = 50 + 2.x, with origin at 1983 and unit of 
measurement for x is one year. Shift the origin at 1980. 
(y = 44 + 2x, with origin atthe year 1980) . 
(6) If the linear trend in the data for the years 1960 to 1965 both inclusive with origin at 
the middle of 1962 and 1963 is y = 1306.667 + 73.428 x, the unit of x being one 


year, then determine the trend line with origin at 1960 and hence determine the trend 


values. 
( y = 1123.097 + 73.428 x; 1123.097, 1196.525, 1269.953, 1343.381, 1416.809, 
1490.237 ) | 
(c) The parabolic trend equation for the projects of a company (in thousand rupees) is 
y = 10.4 + 06x + 0.7 x2, with origin at 1980 and unit of measurement for x is 
one year. Shift the origin to 1975. . 
(y = 249 - 64x + 0.7x7) 


. 





Exercise 16.2 


Objective Questions | 
1. _ With which particular characteristic movement of a time series would you mainly 
associate each of the following: ‘<i 
(i) Increased demand for foot-wears before Eid. (S) 
(ii) | The decline in death rate due to advancement in science. (7) 


(iii) A steel strike, delaying production for a week. | © 








Analysis of Time Series 


(iv) Rise in the prices of certain consumer goods due to tax increase 
in the annual budget. 
(v) | Anera of prosperity in a business. 
(vi) The festival sale. 
(vii) The production of sugar recorded for 1986, 1987, ---, 1992. 
(viii) The weekly statement of the sale of pens. 
(ix) A fire in a factory delaying production for 3 weeks. 
(x) | Anafter Eid sale in a departmental store 
(xi) A need for increased wheat production due to a constant 
increase in population. 
(xii) The monthly rainfall in inches in a city over a 5-year period. . 
(xiii) A recession in a business. 
(xiv) An increase in employment during summer months. 
(xv) Acontinually increasing demand for smaller automobiles. 
2. State whether the following statements are true or false. 
(i) The graph of a time series is called histogram. 
(ii) Secular trend is a short term variation. 
(iii) | Seasonal variations are regular in nature. 
(iv) Secular trend has booms and depressions. 
(v) Irregular variations are not regular in nature. 
(iv) The increase in the school fee in private schools is an UTC e ula, 
variation. | 
(vii) The increase in the number of patients in the hospitals is like 


(vii) 
(ix) 
(x) 
(x1) 
(xi) 
(xiii) 


(xiv) 


secular trend in a time series. 


The increase in the number of patients of heat stroke in 
summer is like secular trend in the time series. 


The secular trend is measured by a straight line when a time 
series has an upward trend. 


The secular trend is measured by semi-averages method when 
trend is linear. 


The straight line is fitted to a time series when the movements 
in the time series are linear. 


In the measurement of secular trend by the method of least 
squares, the number of years must be odd. 

For a least squares linear tend y = a+bx, the bisa 
variable and y is the slope of the line. 


Seasonal variations can be measured only when the time series 
contains yearly values. 


267 


(false) 
(false) 
(true) 
(false) 
(true) 
(false) 


(true) 

(false) 

(false) 
(true) 


(true) 


(false) 


(false) 


(false) 





(i) 


(ii) 


(uit) 


(iv) 


(v) 


(vi) 


Statistics — Part Il 


Multiple choice: Select a suitable answer: 
The graph of a time series is called 
(a) histogram (b) polygon (c) straight line (d) historigram 
The secular trend is measured by the method of semi-averages when: 
(a) time series contains yearly values (b) trend is linear 
(c) time series contains odd number of values (d) none of them 
In the measurement of secular trend the moving averages 
(a) give the trend in a straight line (b) measure the seasonal variations 
(c) smooth out a time series (d) none of them 
For a least squares linear trend y = a + bx, the b is the: 
(a) variable (b) intercept (c) trend (d) slope 
For a least squares linear trend y = a + bx, 


@Xy<2Ly ©LI=0 O©2Xy = LY ) noneof them 
For a least squares linear trend y = a + bx, the >(y — y)* = 0 when 


(a) all the y-values lie on the line. (b) all the y-values are positive. 
(c) all the y-values lie above the line. (d) none of them. 
{@d, (ii) b, (iii) c, (vd, (Wc, (i) a} 








17 ORIENTATION 
OF COMPUTERS 


17.1 INTRODUCTION TO COMPUTER 


17.1.1 Computer. The computer is defined as an electronic device which is used to store and 
process data to solve different problems according to a set of instructions given to it. The word 
“computer” came from the word “compute” which means “to calculate”. 


17.1.2 Capabilities of Modern Computer. The following are the details of capabilities of a 
modern computer. 


Speed. The speed of a computer is defined as the number of instructions processed in one 
second. 


A computer can perform millions of instructions in one micro second. It performs one 
operation at a time. When a computer performs an operation, the clock of the processor generates 
electronic pulses at a fixed rate. It generates millions of pulses or signals in one second. The 
number of pulses generated in one second is called frequency. The unit of frequency is hertz 
(abbreviated Hz). Hertz is a measure of number of vibrations per second. 


- The speed of.a computer is measured in megahertz (abbreviated MHz) and in gigahertz 
(abbreviated GHz). Modern personal computers may have the speed more than 3 GHz ( 1 GHz 
= 1024MHz). 


Data Storage. A computer can store a large amount of data. It stores the data in its memory and 
can retrieve it with a high speed. The ability of a computer to store the data’and to retrieve it with 
a very high speed makes it suitable for modern data processing. 


Data is defined as a combination of characters, numbers and symbols collected for a 
specified purpose. 
Data Processing. Data processing consists of series of operations performed on the data to 
achieve the required results. 


The main function of a computer is data processing. It includes the arithmetic and logical 
operations. It also includes the classification of data, arrangement of data and its transmission 
from one place to another. The results of data processing are called the output or the information. 


Accuracy. The computers are very accurate in calculations. 


A modern computer can perform millions of operations in one second without any error. 

The accuracy of calculations depends upon the input data and the program instructions. If the 

input data and the program instructions are correct, then we expect that the computer will produce 
‘accurate result. 


Diligence. The computer has the ability to do work for long hours. It never tires. Working for 
long hours does not affect the accuracy of a computer. 


269 


SS 
nt 
—~_ 


NHI | Statistics — Part I 


17.2 HISTORY OF COMPUTER 


The history of computer and calculator goes back to a very long way. For many 
centuries, people used their own brain-power to perform arithmetic calculations. The names of 
three great scientists who contributed in the invention of computer are 


iL ~ Abu Jaafar Muhammad Ibn Musa Al-Khwarizmi (780 — 850 ) 
2: Alan Mathison Turing ( 1912 — 1954 ) 
3: John von Neumann ( 1903 — 1957 ) 


Blaise Pascal, a mathematician and scientist of France, developed the first mechanical 
adding machine called “Pascaline” in the 1642. Pascaline performed addition and subtraction. 
This machine was modified by Baron Gottfried Wilhelm von Leibnitz in 1671. He introduced 


“Multiplier Wheel” to perform all the basic arithmetic operations such as addition, subtraction, 
multiplication and division. 


The designer of the first computer was Charles Babbage a mathematician of the United 
Kingdom. He designed a machine called “Analytical Engine” in 1837 Analytical Engine was the 
first programmable computer. Itconsisted of the following units. 

(i). A storage (lo store data) 
(ii) A mull (to perform arithmetic operations) 
(iii) A control unit (to control all operations and to coordinate the Input/ Output units). 


The program (instructions) was given to the Analytical Engine with the help of punched 
cards. 

The Americans were also experimenting to develop a computer. An American scientist 
working at Harvard University, developed a computer between 1937 and 1943. It was the 
“Harvard Mark-I” 

In 1943, American scientists, J. W. Mauchly and J. P. Ekert developed an electronic 
- computer at Moor School of Engineering, U.S.A. The electronic computer was called Electronic 
Numerical Integrator and Calculator (ENIAC). Manufacturing of ENIAC was started in 1943 
and finally completed in 1946. ENIAC differed in only one significant way from the computer of 
today that its programs were stored externally on tape. This means that programs could be 
executed sequentially. 

In 1944 John von Neumann suggested that the computer program should actually be 
stored electronically inside the computer. This was the final breakthrough in computer design. 


173 TYPES OF COMPUTERS 
The computers are of three types: 
(1) Digital Computer 
(ii) Analog Computer 
(iii) Hybrid Computer 
17.3.1 Digital Computer. A digital computer works with digits. It operates by counting 


numbers or digits and gives output in digital form. It works with only two signals, 0 and 1. The 
data and instructions are entered and stored in coded form of 0’s and 1’s. 


9, These computers are manufactured in wide variety of sizes, speeds and capacities. The 
digital computers are commonly used in’ offices and educational institutions. Digital watches, 
digital thermometers, efc., are the examples of digital computers. 











Orientation of Computers 271 


17.3.2 Analog Computer. An analog computer does not operate directly with digital signals. 
It receives input gives output in the form of an analog signal. 


The analog computers measure physical quantities to give output on a scale. The output 
is tn the form of graph or a reading on a scale. A dial clock , thermometer and weighing machine 
are all examples of analog computers. The results achieved are not accurate as compared to those 
achieved by digital computers. 

17.3.3 Hybrid Computer. <A hybrid computer have features of both analog and digital — 
computers. 


The hybrid computers get input and give Gates either in analog or digital form. Modem 
is an example of hybrid computer. 


17.4 CLASSIFICATION OF COMPUTERS 


The computers are manufactured in a wide variety of sizes, speeds and capacities. In 
computer terminology, size refers to the amount of data a computer can handle. Generally a 
computer with a high processing speed is called a big computer. Depending upon their speed and 
memory size, the computers are classified into the following different groups 


(i) | Micro Computers (iii) Mainframe Computers 
(ii) Mini Computers ('v) Super Computers 
17.4.1 Micro Computers. The aicro computers or personal computers ave designed to be 


used by one user at a time. These are commonly used in offices, at homes and in educational 
institutions. These computers have processing speed of the order of millions of instructions 
processed per second (MIPS). The peripherals used in these systems include keyboard, monitor, 
character or page printer and a mouse. 

The micro computers are small in size and are mainly used in accounting, database, word 
processing and graphics, efc. Laptop and notebooks are micro computers. 
17.4.2 Mainframe Computers. The mainframe computers are very large computers. The 
mainframe computers have very high processing speed. These computers are used by large 
business organizations like banks, insurance companies, scientific research institutes and weather 
forecasting bureaus. The largest IBM S/ 390 mainframe, for example, can support 50,000 users 
while executing more than 1,600,000,000 instructions per second. 
17.4.3 Mini Computers. The mini computers released in 1960s got their name because of 
their small size compared to the other computers of the day. They are smaller version of the 
mainframe computers. Like the mainframes, mini computers can handle much more data than 
personal computers. These are used for maintaining details of a large business organization, to 
analyse the results of experiments or to control and maintain the production activity in factory. 


The mini computers have large memory and faster input/ output devices. They are more 


expensive and have more processing speed than micro computers. The most powerful mini 
computer can serve the input and output needs of hundreds of users at a time. The mini computers 
cost anywhere from $ 18,000 to $500,000 and are ideal for many organizations and companies 
that cannot afford or do not need mainframe systems. 
17.4.4 Super Computers. The super computers are the most powerful computers made, and 
physically they are some of the largest. These systems are built to process huge amounts of data, 
and the fastest super computers can perform more than | trillion calculations per second. 

Some super computers such as the Cray T90 system can house thousands of processor. 
This speed and power make super computers ideal for handling large and highly complex 
problems that require extreme calculating power. These computers are used by Nuclear scientists 





272 Statistics — Part Il 


to create and analyze models of nuclear fission and fusion, predicting the action and the reactions 
of millions of atoms as they interact. These computers are also being to map the human genome, 
or DNA structure. The super computers can cost tens of millions of dollars and consume enough 
electricity to power dozens of homes. 


175 HARDWARE AND SOFTWARE 


17.5.1 Hardware. The physical parts of the computer are called hardware. It includes all 
physical devices or units that make up a computer. The examples of hardware are: CPU, monitor, 
mouse, keyboard, ec. 


17.5.2 Software. The set of instructions given to the computer to solve a problem or to control 
the operation of the computer is called software. The, software is prepared in computer 


programming languages. The examples of software are: Microsoft Word, Excel, Corel Draw, 
Photoshop, efc. , 


17.6 HARDWARE COMPONENTS OF A PERSONAL COMPUTER 


The computer itself, the hardware, has many parts, but the critical components fall into 
one of four categories. 


l. Central Processing Unit (CPU) 
2 Main Memory 
3. Input/ Output Devices 
4 Secondary Storage 
CENTRAL 


PROCESSING UNIT 


ARITHMETIC 
LOGIC UNIT 







INPUT DEVICES OUTPUT DEVICES 


0 ay 


MONITOR PRINTER 


CONTROL 





MOUSE KEYBOARD 


MAIN MEMORY 
(RAM) 





Fig. 17.1 Hardware Components of Personal Computers 





— 


ee ie ee oe 


Orientation of Computers 273 


17.6.1 Central Processing Unit (CPU). The central processing unit is the brain of the 
computer, the place where data is manipulated. In large computer systems, such as super 
computers and mainframe computers, processing tasks may be handled by multiple processing 
chips. (Some powerful computers systems use hundreds or even thousands of separate processing 
units). In average micro computer, the entire CPU is a single chip called a micro processor. The 
CPU has at least two basic parts: 


(i) Control Unit 
(ii) Arithmetic Logic Unit (ALU) 


Control Unit. All the computer’s resources are managed from the control unit. Think of the 
control unit as a traffic cop directing the flow of data through the CPU, and to the other devices 
The control unit is the logical hub of the computer. 


The CPU’s instructions for carrying out commands are built into the control unit. The 
instructions, or instructions set is expressed in macrocode a series of basic direction tells CPU 
how to execute more complex operations. 


Data will have to be first transferred from the input device or secondary storage to. the 
main memory and taken from there to the ALU for processing. Instructions on what to do with 
the data must be given to the ALU. Then the results have to be transferred to the main memory 
and from there to the output device. For these and many more such tasks we need a sort of 
manager. It is the control unit which takes care of all these activities. 


One of the most important function of the control unit is the handling of program steps. 
Each basic instruction such as ‘add’ , ‘subtract’ or ‘store’ is in the form of code. Only the control 
unit understand each code and gets the instruction executed. In that process it may move data 
from an input device to the memory, from the memory to the ALU, from the ALU back to the 
memory, from memory to an output device and so on. The control unit is like the nervous system 
of the body and supervises all the operations of computer 


Arithmetic Logic Unit (ALU). The arithmetic logic unit is a part of the processor in which all 
arithmetic and logical operations on the data are performed. 


Arithmetic section of the ALU performs basic arithmetic operations such as addition, 
subtraction, multiplication and division. 


A logical operation is one in which data is compared. For example, whether the first 
number is greater than the second number, or it is less than, equal to, not equal to, grater than or 
equal to, efc. The oes section of ALU performs logical sees 


i add» = equal to, not equal to 
subtract ' greater that, not greater than 
multiply less than, not less than 


Racy greater than or equal to, 
divide not greater than nor equal to 
less than or equal to, 
not less than nor equal to 


The ALU includes a group of registers high speed memory locations built directly into 
the CPU that are used to hold the data currently being processed. For example, the control unit 
might load two numbers from memory into the register in the ALU. Then it might tell the ALU to 


raised by a power 





274 & | | Statistics — Part I 





divide the two numbers (an arithmetic operation) or to see whether the numbers are equal ( a 
logical operation ) 


17.6.2 Main Memory. The main memory, also called RAM (Random Access Memory) or 
primary storage contained in the processor unit of the computer temporarily stores data and 
programme instructions when the are being processed. 


The main memory has many storage locations. Each memory location has a Storage 
Address, like a Post Box number. The computer stores or retrieves data using the address. The 
computer always keeps a list of data items-and corresponding addresses. This is, of course, done 
automatically and we need not worry about it. 

When the computer retrieves data from a location, it merely reads what is stored and 
transfer them elsewhere. [t does not destroy the stored data. On the other hand, when it stores new 
data in a location, the previous contents in that location are lost. 


The most common measurement unit for describing a computer’s memory is bytes, the 
amount of memory it takes to store a single character, such as a letter of the alphabet or 
numerical. 


The measurement for Computer Memory and Storage 






Abbreviation Pronounced - Approximate Actual 
Se | value (bytes) values (bytes) 
Kilobyte KB KILL-uh-bite 1,000 1,024 
Megabyte MB MEHG-uh-bite —_ 1,000,000 1,048,576 
| ( 1 million ) 
Gigabyte GB GIG-uh-bite 1 000,000,000 1,073,741,824 
( | billion ) 
_ Terabyte TB TERR-uh-bite —_1,000,000,000,000 1,099,511,627,776 


a trillion) 





Today's personal computers commonly have from 1 GB to4 GB of memory. Some computers 
improve their processing efficiency by using a limited amount of high speed RAM memory between 
the CPU and main memory. High-speed memory used in this manner is called cache (pronounced 
cash) memory. Cache memory is used to store the most frequently used instructions and data. When 
the processor needs the next program instructions and data, it first check the cache memory. If the 
required instruction or data is present in cache (called a cache bit), the processor will execute faster 
than if the instructions or data has to retrieve from the slower main memory. 


17.6.3 Input/Output Devices. Computers would be useless if they did not provide interaction 
with users. They could not receive or deliver the results of their work. Input devices accept data 
and instructions from the user or from another computer system (such as a computer on the 
internet). Output devices return processed data back to the user or to another computer system. 


_Input Devices. Before processing unit can work, the data and programme must be entered into 
the computer memory, this is done by means of input devices. The most common input devices 
are keyboard, mouse, scanners and digital cameras. 


Output Unit. There are various devices to present information in a particular manner or to deliver 
_ Mt at appropriate speed., e. g., video display units, line printer and COM (Computer Output 
Microfilm). | 





Orientation of Computers 3 re 275 


17.6.4 Secondary Storage. A computer can function with only processing unit, memory, input 
and output devices. To be really useful, however, it also need a place to keep programme files and 
related data when it is not using them. The purpose of storage is to hold data. 


It is important to understand the difference between how a computer uses main memory 
and how it uses secondary storage. Main memory, also called primary storage or RAM, 
temporarily stores programmes and data being processed: Secondary storage, also called auxiliary 
storage, stores programmes and data when they are not being processed. 


The physical components or materials on which data is stored are called storage media. 

The hardware components that write to, and read it from, storage media are called storage 
devices. Two main categories of storage technology used today are magnetic storage and optical 
storage. Although most storage devices and media employ one technology or other, some use 
both. 
The primary types of magnetic storage are as follows; 

(i)  Diskettes 

(1) Hard disks (both fixed and removable) 

(iii) High-capacity floppy disks 

(iv) Disk cartridges 

(v) Magnetic tape 
The primary types of optical storage are as follows; 

(i) | Compact Disk Read-Only Memory (CD-ROM) | 

(ii) Digital Versatile Disk Read-Only Memory (DVD-ROM) 

(iii) CD-Reeordable (CD-R) 

(iv) CD-Re Writable (CD-RW) 

(v) Photo CD 

The most common storage medium is the magnetic disk. A disk is a round, flat object 

that spins around its centre. Read/ write heads , which are similar to the heads of tape recorder or 
VCR, are used to read data from the disk or write data onto the disk. Depending on the type of 
disk, read/ write heads may float just above the disk’s surface or may actually touch the disk. 


17.7. INPUT DEVICES AND OUTPUT DEVICES 


17.7.1 Input Devices. Input devices consist of hardware that translate data into a form the 
computer can process. The people readable form may be words like the ones in these sentences, 
but computer readable’ form consists of ‘0’ and ‘1’ or “off” and “on” electrical signals. Input 
hardware devices-are categorized as three types 

(it) Keyboards 

(ii) Pointing devices 

(iii) Source data entry devices 
17.7.2. "Keyboard. Keyboard is a device that converts letters, numbers and other characters 
into electrical signals that are machine readable by the computer processor. The keyboard may 


look like a type writer keyboard to which some special keys have been added. Keyboard has 3 
types of keys, namely 





276 Statistics — Part II 


(i) Alphabet keys (A, B, C, +++, Z, a, b, c,***, z ) 


(ii) Numeric keys (1, 2, 3, 4, 5, 6, 7, 8, 9, 0) 
(iii) Special keys (Fl, F2,°:-, F12, Alt, Ctrl, Shift, Tab, Capslock, Enter, -- -, etc ) 


The standard keyboard has 101 buttons on it and now a days the keyboards with 104, 
106, 110 buttons are available in the market. 


17.7.3. Pointing Devices. Pointing devices control the Joes of the cursor or pointer on the 
screen. Pointing devices include. 


(it) Mouse (ii) Light pens, etc. 
Mouse. A mouse is a device that is rolled about on a desktop and direct a pointer on the 
computer's display screen the mouse has a cable that is connected to the micro computer’s system 
unit by being plugged into a special port or socket. It has two/ three buttons, a wire or wireless. 
On the bottom side of the mouse is ball that translates the mouse movement into digital signals. 


Depending upon the software, many commands that you can execute with a mouse can also be 
performed through a keyboard. The following are the functions of a mouse. 
(i) Point. Move the pointer to the desired spot on the screen, such as over a particular word 
or object. 
(ti) Click. Press and quickly release, the left mouse button twice as quickly as possible. 
(iii) Drag. the pointer to another location 
(iv) Drop. Release the mouse button aficr dragging 
(v) Right click. To make a selection using the button on the right side of the mouse which 
} usually brings up a pop up menu 
Trackball. The trackball is movable, on top of a stationery device; that is rotated with fingers or 
palm of the hand. Trackballs are specially suited to portable computers, which are often used in 
_ confined places such as on airplane tray, tables. Trackballs may appear on the keyboard centred 
below the space bar. 
Joystick. A joystick is a pointing device that consist of vertical handle like a gearshift lever 
mounted on a base with one or two buttons 
Touch pad The touch pad is a small, flat surface over which you slide your finger, using the 
same movement mouse. 
Light. Pen. The light pen is a light sensitive stylus, or pen like device connected by a wire to the 
computer terminal the user brings the pen to a desired point on the display screen and bressesit the 
pen button, which identifies that screen location to the computer. cf 
17.7.4 Output Devices. The devices that are used-to receive data from the CPU i in ‘nian codé 
and convert it into readable form are called output devices. The output'devices enable CPU'to 
transfer information to the user and other devices, a 
The output device receives data from CPU in computer code and converts it into a form 
that a user can understand or which is readable to the other devices. For example, the binary 
string 01000001 from CPU represents letter ‘A’ on the screen. ‘The output is divided in two 
categories: 
(i), . The output that is ‘sent to the secondary storage, é. 2 ie magnetic tape disk, etc. This 
output can be used by the CPU as input for further processing. 
(ii) The output that can be read and used by people. This output is further divided into: 





iad 


E- 
E 
= 
> 
- 
. 
+ 
= 
, 
3 
: 
v 
= 
= 
— 


mo” 





Orientation of Computers ; 277 


(a) Softcopy Output. It is the output that is temporary and is erased when the semi is 
switched off, e. g. display on the computer screen. 


(b) Hardcopy Output. It is the output that is permanent and is always available for use, 
é. g., print out on the paper. 


Soficopy Output Devices. Softcopy output devices are used to display output on the screen. They 
are also called Visual Display Units (VDU). The most commonly used softcopy device are; 


(i) Monitors 
(ii) | Pe Projectors 
(iii) Sound Systems 
Hardcopy Output Devices. The computer user usually needs output printed on the paper for 


permanent record. The output received from the computer on the paper is called hardcopy. The 
devices used to produce a hardcopy are two types 







(i) Printer 
(ii) Plotter 
DATA DATA PROCESSING INFORMATION 
INPUT OPERATION OUTPUT 
(_ ARITHMETIC LOGIC UNIT_) 
FERED Gare 4 ea | 
| KEYBOARD Hs Ss RES Ate /MONITOR | 
| CONTROL UNIT > 
| ea PRINTER | 


MOUSE | 


|  hpetethesl dnp eieeennahted teascabpebentoiee 


(__ MEMORY UNIT) 


| STORAGE _si 


ee 


Fig. 17.2 Information Processing System 
17.8 SYSTEM SOFTWARE 


System software consists of all programs including the operating system that are to control the 
operations of the computer equipment. Some of the functions that system software perform include: 
starting up the computer; loading, executing, and storing application programs; storing and retrieving 
files; and performing a variety of functions such as formatting disks, sorting data files, and translating 
programs instructions into machine languages. System software can be classified into three major 
categories; operating systems, utilities, and language translators. 


17.9 OPERATING SYSTEM 


An operating system (OS) is an integrated set of programs that is used: ta manage the 
various hardware resources of computer system. Its prime objective is to improve the 
performance and efficiency of a computer system and increase facility, the ease with which a 


278 Statistics — Part II 





system can be used. Each ime a computer is turned on, or restarted the operating system is loaded 
into the computer and stored in the computer’s main memory. 


17.9.1 Functions of Operating Systems. The following are the functions of operating 
systems. 


(i) | Processor management, that is, assignment of processors to different tasks being 
performed by the computer system 

(ii) |. Memory management, that is, allocation of main memory and other storage areas to 
system programs as well as user programs and data. 

(iii) Input/ Output management, that is, coordination and assignment of the different input 
and output devices while one or more programs are being executed. 

(iv) File management, that is, the storage of files on various storage devices.and transfer of 
these files from one storage device to another. It also allows all files to be easily changed 
and modified through the use of text editors or some other file manipulation routines. 

(v) Establishment and enforcement of job priority system. That is, it determines and 
maintains the order in which jobs are to be executed in the computer system. 

(vi) Automatic transition from job to job as directed by special control statements. 

(vii) Interpretation of commands and instructions. 

(viii) Coordination and assignment of compilers, assemblers, utility programs and other 
software to the various users of the computer system. 


(ix) Establishment of data security and integrity. That is, it keeps different programs, and 
data in such a manner that they do not interfered with each other. Moreover, it also 
protects itself from being destroyed by any user. 

(x) Production of dumps, traces, errors messages and other debugging and error-detecting 
aids. 

(xi) Maintenance of internal time clock and log of system usage for all user. 

(xii) Facilitates easy communication between the computer system and the computer (human) 


operator 
The most commonly used operating systems are: 
(it) DOS (ii) WINDOWS 
(iii) OS/2 (iv) UNIX 
(v) LINUX 


17.9.2 DOS. DOS stands for Disk Operating System, the most widely used operating system 
on personal computers. Several slightly different but compatible versions of DOS exist. The two 
most widely used, MS-DOS and PC-DOS were both originally developed by Microsoft 
Corporation in 1981. 

MS-DOS is the text driven user interface, that is, the user types a line of text as a 
command. The computer then executes the command. These commands can be used to format 
disk, copy, and surname, delete backup files and organize and manage files on disk. MS-DOS 
versions 2.0 and up incorporated a tree structured hierarchical file management scheme. In this 
scheme, files can be managed into groups, which are known as directories. MS-DOS version 4.0 
and above added additional enhanced commands and support for network and added a user 
interface called DOS shell with pull down menus. A shell program usually provides a limited 
graphic interface and certain utility functions file maintenance 


a, 


=o 


f 





Orientation of Computers | : 279 


17.10 APPLICATION SOFTWARE 


Application software consist of programs that tell a computer how to produce 
information. When you think of the different ways that people uses computer in their careers or in 
their personal lives, your are thinking of examples of application software. Business, scientific, 
and educational programs are the examples of application software. The most widely used 
personal computer application softwares are: | 


(i) Word processing (ii) | Desktop publishing 

(iii) Spreadsheet (iy) Database 

(v) Presentation graphics (vi) Communications 

(vii) Electronic mail (viii) Personal information management 


(ix) Project management 
17.11 PROGRAMMING LANGUAGES 


A programming language is a way of communication between the user and the 
computer. With the help of a programming language, programmer writes programs to solve 
problems with the computer. 


Each programming language has its own rules for writing a computer program. The rules 
are called the syntax of the language. The process of writing a computer program is called coding. 


17.11.1 Types of Programming Languages. Many computer programming. languages are 
available. Some programming languages are close to human language and some programming 
languages are close to machine language. Therefore, programming languages are divided into two 
types: 

(i) Low level languages 

(ii) High level languages 


Low Level Language. The programming language that are close to machine code are called Low 
Level Languages. The programs or instructions written in these languages are close to the 
machine language instructions. The mainly used low level languages are: 


(i) Machine Language. The instructions written in this language are in the form of binary 
strings of O's and 1’s, It is the fundamental language. The programs written in this 
language are executed directly by the computer. 


(ii) Assembly Language. It is similar to the machine language. In this language, symbolic 
codes are used instead of binary codes. The symbolic codes are also called mnemonic. 
The program written in this language is translated to machine code with the help of an. 
assembler. This language is also known as symbolic language. 


High Level Language. The programming languages that are close to human languages are called 
high level languages. The programs or instructions written in high level languages are close to 
English language. Each high level language has its own rules (syntax) and character set. Some of 
the commonly used high level languages are: | 

ALGOL: Algol stands for ALGOrithmic Language. 

BASIC: _ Basic stands for Beginners All-purposes Symbolic Instruction Code. 

COBOL: Cobol stands for Common Business Oriented Language. 





280 Statistics — Part Il 


hia PASCAL: ~ This language is named in the honour of French mathematician Pascal, 
who invented the first mechanical calculator. 
FORTRAN: Stands for FORmula TRANSslation. ' 
Os It is a general purpose language. It is widely used language in \. 
scientific and all other fields. 





17.12 LANGUAGE PROCESSORS AND TRANSLATORS 


The program that converts a source program, written in the programming, into the 
machine code, i. ¢., in the form of strings of O’s and 1’s is called language processor or 
translator. There are three types of language processors or translators: 

(i) Assembler 

(ii) Interpreter 

(iii) Compiler 
17.12.1 Assembler. An assembler translates a program written in an assembly language into 
machine code. 


INPUT OUTPUT 


Assembly Machine 


Language ASSEMBLER Language 
Program — Program 





17.12.2 Interpreter. The language processors that execute a source program by translating and 
executing one instruction at a time are called interpreters. 


17.12.3 Compiler. A compiler is a translator that coverts a program written in a high-level i 
language. | 


INPUT OUTPUT 


High Level Machine 
Language COMPILER Language - 


Program Program 





17.13. BASIC IDEA OF WRITING AND RUNNING A COMPUTER PROGRAM 


ti 17.13.1 Computer Program. The computer program is a detailed set of instructions that directs 
| a computer to perform the tasks necessary to process data into information. These instructions 
usually written by computer programmer, can be coded (written) in a variety of programming 
languages. A computer program is also known as software. 


17.13.2 Computer Program Development. The program development is a process of 


producing one or more programs to perform specific tasks on a computer. The process of program 
development has evolved into a series of five steps most experts agree, should take place when 


& ‘<< ‘di 


| 
) 
| 
=: Review specification. The programmer reviews the specification created by system 


: 5 any program is developed 

= . analyst during the system design phase. | 

~ 2. Design. The programmer determines and documents the specific acti t 
4 - will take to accomplish the desired tasks. amen or some aa 
| 

=i 

- 






m 








Orientation of Computers 281 


3. Code. The programmer writes the actual program instructions. 
4. Test. The written programs are tested to make sure they perform as intended. 
5. Finalize documentation. Throughout the program development process, the 


programmer documents, or writes, explanatory information about program steps | 
through 4 is brought together and organized. 


Five Steps of Program Development 





REVIEW 
SPECIFICATIONS 


FINALIZE 
DOCUMENTATION 


17.14 NUMBER SYSTEM 


A set of digits, symbols and rules used to express quantities for counting, comparing 
amount, performing calculations, making measurements, representing values, efc. is called 
number system. A number system is named after the base of the system. The total number of 
digits in a number system is called its base. The most commonly used number systems are: 


l. Decimal Number System. 

2. Binary Number System. 

3. Octal Number System. 

4. Hexadecimal Number System. 


The most common number system is the decimal number system. It is used in normal 
every day life. High level computer language nowadays use only decimal number system. Earlier 
programming languages required writing of long strings of numeric digits. Different number 
systems were used as shortcut for writing these strings. These number systems are no longer in 
use. However, their knowledge is necessary for understanding data representation inside the 
computer. 


17.14.1 Decimal Number System. The base of decimal number is 10 and it consists of 10 


digits from 0 to 9. In decimal system, any number greater than 9 is represented by a 
combination of decimal digits. Every digit in the number has its value that depends upon the 


— 


eet ee 


=— 
ber 


282 | Statistics — Part I 


position or weight in the given number. For example, the position and weight of the number 3046 
is given below 


3046 = 3 x10? + 0x10? + 4x10' + 6 x10° 


17.14.2 Binary Number System. The base of binary number is 2 and it consists of 2 digits 
0 and 1, In binary system, any number greater than | is represented by a combination of binary 


digits. Every digit in the number has its value that depends upon the position or weight in the - 


given number. For example, the position and weight of the number 1011 is given below 
LOU See x22 Ole Oot x2) 1x2’ 


17.14.3 Octal Number System. The base of octal number is 8 and it consists of 8 digits from 
0 to 7. In octal system, any number greater than 7 is represented by a combination of octal 
digits. Every digit in the number has its value that depends upon the position or weight in the 
given number. For example, the position and weight of the number 3046 is given below 

3046 = 3x8? + Ox 8 + 4x8' + 6 x 8 
17.14.4 Hexadecimal Number System. The base of hexadecimal number is 16 and it consists 
of 10 digits and 6 alphabets from A to F. In hexadecimal system, any number greater than 15 
is represented by a combination of hexadecimal digits and alphabets. Every digit in the number 
has its value that depends upon the position or weight in the given number. For example, the 
position and weight of the number 3046 is given below 

3046 = 3x16 + 0x16 + 4x16' + 6x16 


The following table represents the binary, octal and hexadecimal representations of a 
decimal number. 





Decimal Binary | Octal Hexadecimal 
numbers representation representation | representation . 





0 0 
1 | 
2 one 2, no units 2 
3 3 
4 4y 5 
5 one 4, no twos, one unit 5 
6 6 
7 7 
8 8 
9 one 8, no fours, 9 
no twos, one unit 
10 A 
Il B 
12 C 
13 D 
14 E 
15 eal F 
y 16 ) one 16, no eight, no fours, 10 


_ no twos, no unit 





: 
| 
1 





Orientation of Computers | - 283 


17.15 HOW COMPUTERS REPRESENT DATA 


To a computer, every thing is a number. Numbers are numbers; letter a punctuation 
marks are numbers; sounds and pictures are numbers. Even computer's own instructions are 
numbers. When you see letters of the alphabet on a computer screen, you are seeing just one of 
the computer’s ways of representing numbers. For example, consider this sentence: Here are 
some words. It may look like a string of alphabet characters to you, but to a computer it look the 
string of ones and zeros shown in the following table 





0100 1000 
0110 0101 
0111 0010 
0110 0101 
0010 0000 
0110 0001 
0111 0010 
0110 0101 
0010 0000 
0111 0011 
0110 1111 
0110 1101 
0110 0101 
0010 0000 
0111 0111 
0110 1111 
0111 0010 
0110 0100 
wo O111 OOl11 
17.16 BINARY SYSTEM AS A FOUNDATION 
OF COMPUTER PROGRAMMING 


° In computer, however, all the data is represented by the state of the computer's electrical 
switches, A switch has only two possible states “on” and “off’ so it can represent only two 
numeric values. To a computer when a switch is off, it represents a 0; when a switch is on, it 
represents a 1. Because there are only two values, computer are said to function in base 2, which 
is also known as binary number system ( bi means “2” in Latin ). Why we go for binary 
numbers instead of decimal numbers? The reasons are as fellows: 


1. The first and foremost reason is that electronic and electrical components, by their very 
nature, operate in binary mode. Information is handled in the computer by electronic / 
electrical components such as transistors, semiconductors, wires, efc., all of which can 


only indicate two states or conditions on (1) or off ( 0 ). Transistors are either 
conducting (1) or non-conducting (0 ); magnetic materials are either magnetized ( | ) 
or non-magnetized (Q ) in one direction or in the opposite direction; a pulse or voltage 
is present (1) or not present (0) in wire. All information is represented within the 
computer by presence or absence of these various signals. The binary number system, 
which has only two digits (0 and 1 ), is the most suitable and conveniently used to 
express the two possible states. 


2. The second reason is that computer circuits only have to handle two binary digits rather 
than ten decimal digits. The result is that the internal circuit design of computers is 
simplified to great extent. This ultimately results in less expensive and more reliable 
circuits for computers. 


3. Finally, the binary number system is used because everything that can be done with base 
of 10 can also be done in binary. 


The reason why the octal number system is used with computers is because it can 
represent binary values in a more compact form and because the conversation between the binary 
and the octal number system is very efficient. 


The primary reason why the hexadecimal number systems is used with computers is 
because it can represent binary values in a more compact form and because the conversation 
between the binary and the hexadecimal number system is most efficient. An eight-digit binary 
number can be represented by a two-digit hexadecimal number 


Exercise 17.1 


1. (a) How are computers generally classified? What are the four major categories of 
computers? 


(6) What is CPU? Why is it called the brain of the computer? 
2. (a) Explain the working of Arithmetic Logical Unit (ALU). 
(a) Explain the Control Unit. | 
(c) What is secondary storage? How it differ from a primary storage? 
3. (a) Describe the various input and output devices with examples. 
4. (a) What is computers software? a 
(6) What are the functions of a system software? ) 
5S. (a) What do you know about DOS? 
i . (6) -- What does application software do and what are its generic types? 
6. (a) What are computer languages and their types? 
(6) | What is an assembler? 
(c) What is a compiler? 
7. (@) What is Binary Number System? Why i is it used in computer? 


J a 
] 
tT 
4 
sy | 
x=. 
— 
: 
—~, = 
‘ 
—.,. 
{ : 
——————— ee ~ - = — 
7 ee ee a 
a a a le _ 





Orientation of Computers 


1. Fill in the blanks. 


(i) 
(ii) 
(wii) 
(iv) 
(v) 
(vi) 


(vii) 


(viii) 
(xi) 
(x) 


(xi) 


(xii) 


(7) 
(i) 
(iii) 


(iv) 
(v) 

(vi) 
(vii) 


Exercise 17.2 
Objective Questions 


is commonly used input device. 
1 MB equals ———— bytes. 
Screen output is considered as a 
CD-ROM is a type of ————— 
is a set of electronic instructions. 
The most common type of computer memory is called 


A high speed memory that is built into the processor is 
called 


RAM is called —————- storage. 

Arithmetic operations are carried out by ————— unit 

The ————— is the TV type screen that you view your 
programs on. 

The t——— allow you to type information into the 
computer 

Keyboard, mouse, scanner are the —————- devices 


Mark off the following statements as true or false. 


1 Kb = 1000 bytes. 
Plotter is an input device to draw the graphs of the output 


A complete computer system has two parts: hardware and 
software. 


The keyboard and monitor are examples of output devices. 
UNIX is a application software 
The purpose of a storage device is to hold data 


Base 2 is another name for the decimal number system 


(viii) A CD-ROM is an example of a magnetic storage device 


(keyboard) 
(1,048,576) 
(softcopy) 
(Optical disk) 
(Software) 


_(RAM) 


(cache memory) 
(primary) 
(ALU) 


(monitor) 


(keyboard) 
(input) - 


(true) 


(false) 


(true) 
- (false) 
(false) 
(true) 
- (false) 
(false) 








(ix) A hard disk may also be referred to as secondary storage a 
ie Lay device. (true) — 


E | (x) The central processing unit (CPU) contains a Control Unit | 
ae - that performs arithmetic and logic operations. (false) ’ 


3 {he _____ (xi)_~—- All computers work on a binary number system (true) 
(xi) FORTRAN is a low-level language (false) . 








Pee Ws Ne bie bo) vt - . 














' 
e 
~ 


| RE PD 
fn Pp Ue ~~ 

RO Ea Oey! £10 sere, 
eae. tT . 


A iairmetye tarts»: . weet fase) 





——— 


SAMPLING 
DISTRIBUTIONS 
FROM NORMAL 
POPULATIONS 





A.1 Chi-square 7? Random Variable. If Z|, Z,,°**, Z, be the v_ independent 
standard normal variables, then the sum of the squares of these v random variables 
n= Dy Z} 
i= 
follows a chi-square distribution with v degrees of freedom. Thus y? is a continuous random 
variable with its range 
R = (x7: 0s x72 < o} 
A.2 Degrees of Freedom. The term degrees of freedom is defined as the number of 
independent or “freely chosen”: variables. 
A.3 Shape of Chi-square Distribution. The chi-square distribution is positively skewed 
rather than symmetric as is the normal. The skewness decreases as the value of its parameter y 
increases. The chi-square distribution approaches to the normal distribution with “= v and 
o? = 2 v as the number of degrees of freedom tends to infinity. 





Fig. A.1 Probability density function of y? random variable 


A.4 Quantiles of Chi-square Distributions. The probability that the chi-square random 
variable 7? is less than or equal to a positive number Xe: is represented by 


EX). p) = PL Sa.) 9p 


287 





ee 
— 


5 oe 





a Statistics — Part I 


This function has been tabulated, and the quantiles 77. , of the chi- square distributions can be 
found in Table 11. For example, if v = 10 and p = 0.9, then 

PU Yio S Xio-09) = PUxip = 15.99) = 0.9 
A.5 | TheT Random Variable. If the random variable Z has a standard normal distribution 
and another randoin variable y* has a chi-square distribution with v degrees of freedom, and if 
Z and 77 are independent, then the random variable 


Z 


Qo 
REAR 
follows a f-distnbution with v degrees of freedom. Thus T is a continuous random variable 
with its range 
R = {t:-0 <t{ < ©} 
A.6 Shape of ¢-distribution. The f-distribution is mound shaped, single humped, perfectly 


symmetric about the value ¢ = Q. As the number of degrees of freedom Vv increases, the /- 
distribution approaches the standard normal distribution. - ' 


f(t) 


0. 


tN GONE) 

\\ T ,v=7 

) a EL 
T,v=1 or Cauchy 





i, =2 =I 0 l 2 a . 

Fig. A. 2 Probability density function of T random variable 
A.7  Quantiles of Student's ¢-distributions. The probability that the random variable T,, is 
less than or equal to anumber t,. , is represented by 

F(t,.,) = PUT, S t,.,) = p 


This function has been tabulated and the quantiles +,. of the Student’s t-distributions can be 


found in Table 12. For example, if vy = 10 and p = 0.9, then 


PCTio S t9:09) = PCT) < 1.372) = 0.9 








Table 7 : 
Ordinates of the Standard Normal Curve at Z =z 


289 





Tabulated Values: (z) = 


& 


-4.9 
-4.8 
-4.7 
-4.6 
4.5 


-4,4 
-4.3 
-4.2 
-4.1 
-4.0 


-3.9 
-3.8 
-3.7 
-3.6 
“3.5 


-3.4 
-3.3 





ies 


o-2t/2 


0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 


|} 0.00000 
| 0.00000 


“0.00001 
0.00001 
0.00002 


0.00002 
0.00004 
0.00006 
0.00009 
0.00013 


0.00020 
0.00029 
0.00042 
0,0006 1 
0.00087 


0.00123 
0.00172 


0.00238 
| 0.00327 
| 0.00443 


0.00595 
0.00792 


0.01042 
0.01358 
0.01753 


0.02239 
0.02833 
0.03547 
0.04398 


} 0.05399 


0.00000 
0.00000 
0.00001 
0.00001 
0.00002 


0.00002 
0.00004 
0.00006 
0.00009 
0.00013 


0.00019 
0.00028 
0.00041 
Q,00059 
0.00084 


0.00119 
0.00167 
0.00231 
0.00317 
0.00430 


0.00578 
0,00770 
0.01014 
0.01323 
0.01709 


0.02186 


0.02768 i 


0,03470 
0.04307 
0.05292 


0.00000 
0.00000 
0.00001 
0.00001 
0.00001 


0.00002 
0.00004 
0.00005 
0.00008 
0.00012 


0.00018 
0.00027 
0.00039 
0.00057 
0.00081 


0.00115 
0.00161 
0.00224 
0.00307 
0.00417 


0.00562 
0,00748 
0.00987 


0.01289 
0.01667 


0.02134 
0.02705 
"0.03394 
0.04217 


0,05186 


0.00000 
0.00000 
0.00001 
0.00001 
0.00001 


0.00002 
0.00003 
0.00005 
0.00008 
0.00012 


0.00018 
0.00026 
0.00038 
0,00055 
0.00079 


0.00111 


0.00156 
0.00216 
0.00298 
0.00405 


0.00545 
0.00727 
0.00961 
0.01256 
0.01625 


"0.02083 
0.02643 
0.03319 
0.04128 
0.05082 


0.00000 
0.00000 
0,00001 


0.00001 


0.00001 


0.00002 
0.00003 
0.00005 
0.00008 
0.00011 


0.00017 
0.00025 
0.00037 
0.00053 
0.00076 


0.00107 
0.00151 
0.00210 


0.00288 


0.00393 


0.00530 
0.00707 
0.00935 
0.01223 
0.01585 


0.02033 


! 0.02582 


0.03246 


0.0404) 


0.04980 


0.00000 
0.00000 
0.00001 
0.00001 


0.00001 


0.00002 
0.00003 
0.00005 
0.00007 
0.00011 


0.00016 
0.00024 
0.00035 
0.00051 
0.00073 


0.00104 


0.00146 
0.00203 


0.00279 | 


0.0038 1 


0.00514 


0.00687 


0.00909 


‘O.01191 


0.01545 


0.01984 
0.02522 


0.03174 


0.03955 


0.04879 


0.00000 
0.00000 
0.00000 
0.00001 


0.00001 


0.00002 
0.00003 
0.00005 
0.00007 
0.00011 


0.00016 
0.00023 
0.00034 
0.00049 
0.00071 


0.00100 
0.00141 
0.00196 
0.00271 
0,00370 


0.00499 
0.00668 
0.00885, 
0.01160 
0.01506 


0.01936 
0.02463 
0.03103 
0.03871 
0.04780 


0.00000 
0.00000 
0.00000 
0.0000! 
0.00001 


0.00002 
0,00003 
0.00004 
0.00007 
0.00010 


0.00015 
0.00022 


0.00033 


0.00047 
0.00068 


0.00097 
0.00136 
0.00190 
0.00262 
0.00358 


0.00485 
0.00649 
0.00861 
0.01130 
0.01468 


0.01888 
0.02406 
0.03034 
0.03788 
0.04682 


0.00000 
0.00000 
0.00000 
0.00001 
0.00001 


0.00002 
0.00003 
0.00004 
0.00006 
0.00010 


0.00014 
0.00021 
0.00031 
0.00046 
0.00066 


0.00094 
0.00132 
0.00184 
0.00254 
0.00348 


0.00470 
0.00631 


0.00837. 


0.01100 
0.01431 


0.01842 
0.02349 
0.02965 


0.03706 © 


0.04586 


0.00000 
0.00000 
0.00000 
0.00001 
0.00001 


0.00002 
0.00003 
0.00004 
0.00006 
0.00009 
0.00014 
0.00021 
0.00030 
0.00044 
0.00063 


0.00090 
0.00127 
0.00178 
0.00246 
0.00337 


0.00457 
0.00613 
0.00814 
0.01071 
0.01394 
0.01797 
0.02294 


0.02898 — 


0.03626 
0.0449] 


ES 


4! 


at BRE 


Table 7 (Continued ) 


_s 


-1.9 
-1.8 
“1.7 
-1.6 
“1,5 


-1.4 


“1.2 
-1.1 
-1.0 


-0,9 
-0.8 
-0.7 


70.6 | 


“0.5 


-().4 | 


-0.3 
-0.2 


-0.1 
0.0 


0.0 
0.1 
0.2 
03 
0.4 


05 


» 0.7 
0.8 


0.9 


1.0 


Ll | 


1.2, 


13 : 
14 


Sl i a Ss ss 


0.00 
0.06562 


| 0.07895 . 


0.09405 
0.11092 
0.12952 


0.14973. 
-1,3 | 


0.17137 
0.19419 
0.21785 
0.24197 


0,26609 
0.28969 
0.31225 
0.33322 
0.35207 


0.36827 


0.38139 


0.39104 
0.39695 


| 0.39894 


0.39894 


| 0:39695 


0.39104 


0.38139 


‘| 0.36827 


} 0.35207 
6.6. 


0.33322 
0.31225 
0.28969 
0.26609 


0.24197 
0.21785 


0.19419 


0.17137 
0.14973 


0.01 


0.06438 
0.07754 
0.09246 
0.10915 
0.12758 


0.14764 
0.16915 


0.19186 - 


0.21546 
0.23955 


0.26369 
0.28737 
0.31006 
0.33121 
0.35029 


0.36678 
0.38023 
0.39024 
0.39654 
0.39892 


0.39892 
0.39634 
0.39024 


0.38023 ~ 


0,36678 


0.35029 


0.33121 
0.31006 
0.28737 
0.26369 


0.23955 
0.21546 
0.19186 
0.16915 
0.14764 


290 


0.02 0.03 0.04 0.05 


0.06316 
0.07614 
0.09089 


0.10741 


0.12566 


0.14556 


0.16694 


0.18954 
0.21307. 
0.23713 


0.26129 
0.28504 
0.30785 
0.32918 


0.34849 


0.36526 
0.37903 
0.38940 
0.39608 
0.39886 


0.39886 


0.39608 


0.38940 
0.37903 
0.36526 


0.34849 


0.32918 


0.30785 
0.28504 


0,21307 
0.18954 
0.16694 
0.14556 


0.26129 


0.23713 


a 


0.06195 
0.07477 


0.08933 


0.10567 
0.12376 


0.14350 
0.16474 
0.18724 


0.21069 


0.23471 


0.25888 
0.28269 
0.30563 
0.32713 
0.34667 


0.36371 
0.37780 
0.38853 
0.39559 
0.39876 


0.39876 
0.39559 


0.38853 


0.37780 
0.3637! 


0.34667 | 


0.32713 
0.30563 
0.28269 
0,25888 


0.23471 
0.21069 
0.18724 
0.16474 


0.14350 


0.06077 
0.07341 
0.08780 
0.10396 
0.12188 


0.14146 
0.18256 
0.18494 
0.2083! 
0.23230 


0.25647 
0.28034 
0.30339 
0.32506 
0.34482 


0.36213 
0.37654 
0.38762 
0.39505 
0.39862 


0.39862 


-0,39505 | 


0.38762 
0.37654 


0.36213 


0.34482 
0.32506 
0.30339 
0.28034 


"0.25647 


0.23230 


0.20831 


0.18494 
0.16256 


0.14146 


0.05959 
0.07206 
0.08628 
0.10226 
0.12001 


0.13943 
0.16038 
0.18265 
0.20594 
0.22988 


0.25406 
0.27798 
0.30114 
0.32297 
0.34294 


0.36053 
0.37524 
0.38667 
0.39448 
0.39844 


0.39844 
0.39448 
0.38667 
0.37524 
0.36053 


0.34294 


0.32297 
0.30114 


0.27798 
0.25406 


0.22988 
0.20594 
0.18265 


0.16038 
0.13943 


0.06 0.07 


0.05844 
0.07074 
0.08478 
0.10059 
0.11816 


0.13742 
0.15822 
0.18037 
0.20357 
0.22747 


0.25164 
0.27562 
0.29887 
0.32086 
0.34105 


0.35889 
0.37391 
0.38568 
0.39387 
0.39822 


0.39822 
0.39387 
0.38568 
0.37391 


0.35889 


0.34105 
0.32086 
0.29887 
0.27562 


‘0.25164 


0.22747 
0.20357 
0.18037 
0.15822 


. 0.13742 


0.05730 
0.06943 
0.08329 
0.09893 
0.11632 


0.13542 
0.15608 
0.17810 
0.20121 
0.22506 


0.24923 
0.27324 
0.29659 
0.31874 
0.33912 


0.35723 


0.37255 


0.38466 
0.39322 
0.39797 


0.39797 
0.39322 
0.38466 
0.37255 
0.35723 


0.33912 
0.31874 
0.29659 
0.27324 
0.24923 


0.22506 


0.20121 | 
. 0.17810 


0.15608 
0.13542 





0.08 


0.05618 
0.06814 
0.08183 
0.09728 
0.11450 


0.13344 
0.15395 
0.17585 
0.19886 
0.22265 


0.24681 
0.27086 
0.2943] 
0.31659 
0.33718 


0.35553 
0.37115 
0.38361 
0.39253 
0.39767 


0.39767 
0.39253 
0.38361 
0.37115 
0.35553 


0.33718 
0.31659 
0.29431 
0.27086 


‘0.24681 


0.22265 
0.19886 
0.17585 
0.15395 
0.13344 





0.09 


0.05508 
0.06687 
0.08038 
0.09566 
0.11270 


0.13147 
0.15183 
0.17360 
0.19652 
0.22025 


0.24439 
0.26848 
0.29200 
0.31443 
0.33521 


0.35381 
0.36973 
0.38251 
0.39181 
0.39733 


0.39733 
0.3918] 
0.38251 
0.36973 
0.35381 


0.33521 
0.31443 
0.29200 
0.26848 
0,24439 


0.22025 
0.19652 
0.17360 
0.15183 
0.13147 





Table 7 (Continued ) : 
z | 0.00 0.01 0.02 0.03 0.04 0.05 0.06 ~—0.07S—«:0.08-—0.09 


1.5 
1.6 
1.7 
1.8 
1.9 


2.0 


2.1 
2.2 
2.3 
2.4 


25 
2.6 
pe | 
2.8 
29 


3.0 
3.1 
3.2 
3.3 
3.4 


3.5 
3.6 
a7 
3.8 
3.9 


4.0 
4.1 
4.2 
4.3 
4.4 


4.5 
4.6 
4.7 
48 


4.9 


0.12952 


| 0.11092 


0.09405 
0.07895 


0.06562 


0.05399 
0.04398 
0.03547 


| 0.02833 


0.02239 


0.01753 
| 0.01358 


0.01042 
0.00792 
0.00595 


| 0.00443 


0.00327 
0.00238 


| 0.00172 


0.00123 


0.00087 


| 0.00061 
| 0.00042 


0.00029 
0.00020 


0,00013 
0.00009 
0.00006 
0.00004 
0.00002 


| 0.00002 
0.00001 


0.00001 
0.00000 
0.00000 


0.00000 


0.12758 
0.10915 
0.09246 
0.07754 
0.06438 


0.05292 
0.04307 
0.03470 
0.02768 
0.02186 


0.01709 
0.01323 
0.01014 
0.00770 
0.00578 


0.00430 
0.00317 
0.0023 1 
0.00167 
0.00119 


0.00084 
0.00059 
0.0004 | 
0.00028 
0.00019 


0.00013 
0.00009 
0.00006 
0,00004 
0.00002 


0.00002 
0.0000 I 
0.00001 
0.00000 


0.12566 
0.10741 
0.09089 
0.07614 
0,063 16 


0.05186 
0.04217 
0.03394 
0.02705 


0.02134 


0.01667 
0,01289 
0.00987 
0.00748 
0.00562 


0.00417 
0.00307 
0.00224 
0.00161 
0.00115 


0.00081 
0.00057 
0.00039 
0.00027 
0,00018 


0.00012 
0.00008 
0.00005 
0.00004 
0.00002 


0.00001 
0.00001 
0.00001 
0.00000 
0.00000 


0.12376 
0.10567 
0.08933 
0.07477 
0.06195 


0.05082 
0.04128 
0.03319 
0.02643 
0.02083 


0.01625 
0.01256 
0.00961 
0.00727 
0.00545 


0.00405 
0.00298 
0.00216 
0.00156 
0.00111 


0.00079 
0.00055 
0,00038 
0.00026 
0.00018 


0.00012 
0.00008 
0.00005 
0,00003 
0,00002 


0.00001 
0.00001 
0.00001 
0.00000 
0.00000 


291 


0.12188 
0.10396 
0.08780 
0.07341 
0.06077 


0.04980 
0.04041 
0.03246 
0.02582 
0.02033 


0.01585 
0.01223 
0.00935 
0.00707 
0.00530 


0.00393 
0.00288 
0.00210 
0.00151 
0.00107 


0.00076 
0.00053 
0.00037 
0.00025 
0.00017 


0.00011 
0.00008 
0.00005 
0.00003 
0.00002 


0.00001 
0.0000 | 


0.00001 


0.00000 
0.00000 


0.1200] 
0.10226 
0.08628 
0.07206 
0.05959 


0.04879 
0.03955 
0.03174 
0.02522 
0.01984 


0.01545 
0.01191 
0.00909 
0.00687 
0.00514 


0.00331 
0.00279 
0.00203 
0.00146 
0,00104 


0.00073 
0.0005 1 


0.00035 


0.00024 
0.00016 


0.0001 1 
0.00007 
0.00005 
0.00003 
0.00002 


0.00001 
0.00001 
0.00001 
0.00000 


0.00000 


0.11816 


0.10059 


0.08478 
0.07074 
0.05844 


0.04780 
0.03871 
0.03103 
0.02463 
0.01936 


0.01506 
0.01160 
0,00885 
0.00668 
0.00499 


0.00370 
0.00271 
0.00196 
0.00141 
0.00100 


0.00071 
0.00049 
0.00034 
0.00023 
0,00016 


0.0001 1 
0.00007 
0.00005 
0.00003 
0.00002 


0.00001 
0.00001 
0.00000 
0.00000 
0.00000 


0.11632 
0.09893 
0.08329 
0.06943 
0.05730 


0.04682 
0.03788 
0.03034 
0.02406 
0.01888 


0.01468 
0.01130 
0.00861 
0,00649 
0.00485 


0.00358 
0.00262 
0,00 190 
0.00136 
0.00097 


0.00068 
0.00047 
0.00033 
0.00022 
0.00015 


(0.00010 
0.00007 
0.00004 
0.00003 
0,00002 


0.00001 
0.00001 
0.00000 
(0.00000 
0.00000 


0.11450 
0.09728 
0.08183 
0.06814 
0.05618 


0.04586 
0.03706 
0.02965 
0.02349 
0.01842 


0.01431 
0.01100 
0.00837 
0,0063 1 
0.00470 


0.00348 


0.00254 


0.00184 
0.00132 
0.00094 


0.00066 
0.00046 
0,00031 
0.00021 
0.00014 


0.00010 
0.00006 
0.00004 
0.00003 
0.00002 


0.00001 
0.0000] 
0.00000 
0.00000 
0.00000 


0.00000 


0.11270 
0.09566 
0.08038 
0.06687 
0.05508 


0.04491 
0.03626 
0.02898 
0.02294 
0.01797 


0.01394 
0.01071 
0,008 14 
0.00613 
0.00457 


0.00337 
0.00246 
0.00178 
0.00127 
0.00090 


0.00063 
0.00044 
0.00030 
0.00021 
0.00014 


0.00009 


- 0.00006 


0.00004 
0,00003 
0.00002 


0.00001 
0.00001 
0.00000 
0.00000 


ae 


ae 


J 
— 


5 > 
ow ee 


Lone 


= 


= 





Table 9 


Standard Normal Cumulative Distribution Function 


292 


Values of the 


Tabulated Values: @(z) = P(Z <z) = 


Zz 
45 


-4.4 
-4,3 
-4.2 
4.1 
-4.0 


-3.9 
-3.8 


“3.7 


-3.6 


-3.4 
3.3 
3.2 


-1.9 


- -1.7 
1.6 
: -1.5 


0.00 


0.00000 


0.00001 
0.00001 
0.00001 
0,00002 
0.00003 


0.00005. 
0.00007 
0.00011. 
0,00016 
0.00023 


0.00034 
0.00048 
0.00069 
0.00097 


0.00135 . 


0.00187 
0.00256 
0.00347 
0.00466 
0.00621 


0.00820 
0.01072 
0.01390 
0.01786 
0.02275 


0.02872 
0.03593 


(0.04457 


0.05430 
0.06681 


0.01 


0.00000 


0.00001 
0.00001 
0.00001 
0.00002 
0.00003 


0.00005 
0.00007 
0.00010 
0.00015 
0,00022 


0.00032 
0.00047 
0.00066 
0.00094 
0.00131 


0,00181 
0.00248 
0.00336 
0.00453 


. 0.00604 


0.00798 
0.01044 
0.01355 
0.01743 
0.02222 


0.02807 
0,03515 
0.04363 
0.05370 
0.06552 


0.02 


0.00000 


0.00000 


0,00001 
0.00001 
0.00002 
0.00003 


0.00004 
0.00007 
0.00010 
0,00015 
0.00022 


0.00031. 


0.00045 
0.00064 
0.00090 
0.00126 


0.00175 


0.00240 
0.00326 
0.00440 
0.00587 


000776 
0.01017 
0.01321 
0.01700 
0.02169 


0.02743 
0.03438 
0.04272 
0.05262 
0.06426 


0.03 


0.00000 


0.00000 
0.00001 
0.00001 
0.00002 
0.00003 


0.00004 
0.00006 
0.00010 
0.00014 
0,00021 


0.00030 
0.00043 
0.00062 
0.00087 
0,00122 


0.00169 
0.00233 
0.00317 
0.00427 
0.00570 


0.00755 
0.00990 
0.01287 
0.01659 
0.02118 


0.02680 


0.03362 


0.04182 
0.05155 
0.06301 


0.04 
0.00000 


0.00000 
0,00001 
0.00001 
0.00002 
0.00003 


0.00004 
0.00006 
0,00009 
0.00014 
0,00020 


0.00029 
0.00042 
0.00060 
0.00084 
0.00118 


0.00164 
0.00226 
0.00307 
0.00415 
0.00554 


0.00734 
0,00964 
0.01255 
0.01618 
0.02068 


0.02619 
0.03288 
0.04093 


9.05050 : 


0.06178 





0.05 


0.00000 


0,00000 
0.00001 
0.00001 
0.00002 
0.00003 
0.00004 
0.00006 
0.00009 
0.00013 
0.00019 


0.00028 
0.00040 
0.00058 
0.00082 
0.00114 


0.00159 
0.00219 
0.00298 
0.00402 
0.00539 


0.00714 
0.00939 
0.01222 
0.01578 
0.02018 


0.02559 
0.03216 
0.04006 
0.04947 
0.06057 


0.06 


0.00000 


0.00000 
0.00001 
0.00001 
0.00002 
0.00002 


0.00004 
0.00006 
0.00008 
0.00013 
0.00019 


0.00027 
0.00039 
0.00056 
0.00079 
0.00111 


0.00154 
0.00212 
0.00289 
0.00391 
0.00523 


0.00695 
0.00914 
0.01191 
0.01539 


0.01970 


0.02500 
0.03144 
0.03920 
0.04846 
0.05938 


= o- 27/2 dz 


0.07 


0,00000 


0.00000 
0.00001 
0.00001 
0.00002 
0.00002 


0.00004 
0.00005 
0.00008 
0.00012 
0.00018 


0.00026 
0,00038 
0,00054 
0.00076 
0.00107 


0.00149 
0,00205 
0.00280 
0.00379 
0.00508 


0.00676 
0.00889 
0.01160 
0.01500 
0.01923 


0.02442 
0,03074 
0.03836 
0.04746 
0.05821 


0.08 


0.09 


0.00000 0.00000 


0.00000 0.00000 


0.00001 
0.00001 
0.00001 
0.00002 


0.00003 
0.00005 
0.00008 
0.00012 
0.00017 


0.00025 
0.00036 
0.00052 
0.00074 
0.00104 


0.00144 
0.00199 
0.00272 
0.00368 
0.00494 


0.00657 
0.00866 
0.01130 
0.01463 
0.01876 


0.02385 
0.03005 
0.03754 
0.04648 
0.05705 


0.00001 
0.00001 
0.00001 
0.00002 


0.00003 
0.00005 
0,00008 
0.0001 1 
0.00017 


0.00024 
0,00035 
0,00050 
0.0007 | 
0.00100 


0.00139 
0,00193 
0.00264 
0.00357 
0.00480 


0.00639 
0.00842 
0.01101 
0.01426 
0.01831 


0.02330 
0,02938 
0,03673 
0.04551 
0.05592 


ieneoets= 


eer Gat bo hist Rake st Pi Tol Se 


Table 9 ( Continued ) 


ra 


-1.4 
-13 
-1.2 
-1.1 
-1.0 


-0.9 
-0.8 
-0.7 
-0.6 
-0.5 


-0.4 


-0.1 
0.0 


0.0 
0.1 
0.2 
0.3 
0.4 


0.5 
0.6 
0.7 
0.8 
0.9 


1.0 
1.1 
1.2 
1.3 


1.5 
1.6 
1.7 
1.8 
1.9 


0.00 
0.08076 
0.09680 
0.11507 


0.13567 


0.15866 


0.18406 


0.21186 
0.24196 
0.27425 


| 0.30854 


| 0.34458 
-0.3 | 
-0,2 


0.38209 
0.42074 
0.46017 


} 0.50000 


0.50000 
0.53983 
0.57926 
0.61791 
0.65542 


0.69146 
0.72575 
0.75804 
0.78814 


| 0.81594 


| 0.84134 
| 0.86433 


0.88493 


| 0.90320 
1.4 


0.91924 


0.93319 
0.94520 
0.95543 
0.96407 
0.97128 


0.01 


0.07927 
0.09510 
Q.11314 
0.13350 
0.15625 


0.18141 
0.20897 
0.23885 
0.27093 
0.30503 


0.34090 
0.37828 
0.41683 
0.45620 
0.49601 


0,50399 
0.54380 
0.58317 
0.62172 
0.65910 


0.69497 
0.72907 
0.76115 
0.79103 
0.81859 


0.84375 
0.86650 
0.88686 
0.90490 
0.92073 


0.93448 
0.94630 
0.95637 
0.96485 
0.97193 


0.02 0.03 0.04 0.05 0.06 0.07 0.08 


0.07780 
0.09342 
0.11123 
0.13136 
0.15386 


0.17879 
0.20611 
0.23576 
0.26763 
0.30153 


0.33724 
0.37448 
0.41294 
0.45224 
0.49202 


0.50798 
0.54776 
0.58706 
0.62552 
0.66276 


0.69847 
0.73237 
0.76424 
0.79389 
0.82121 


0.84614 
0.86864 
0.88877 
0.90658 
0.92220 


0.93574 
0.94738 
0.95728 
0.96562 
0.97257 


0.07636 


0.09176. 


0.10935 
0.12924 
O.1S151 


0.17619 
0.20327 
0.23270 
0.26435 
0.29806 


0.33360 
0.37070 
0.40905 
(0.44828 
(0.48803 


0.51197 
O55 172 
0.59095 
0,62930 
0.66640 


0.70194 
0.73565 
0.76730 
0.79673 
0.82381 


0.84849 


0.87076 
0.89065 
0.90824 
0.92364 


0.93699 
0.94845 
0.95818 
0.96638 
0.97320 


293 


0.07493 
0.09012 
0.10749 
O.12714 
0.14917 


0.17361 


0.20045 
0.22965 
0.26109 
0.29460 


0.32997 
0.36693 
0.40517 
0.44433 
0.48405 


0.51595 
0.55567 
0.59483 
0.63307 
0.67003 


0.70540 
0.73891 
0.77035, 
0.79955 
0.82639 


0,85083 
0.87286 
0.8925 [ 
0.90988 
(0.92507 


0.93822 
0.94950 
0.95907 
0.96712 
0.97381 


0.07353 
0.08851 
0.10565 
0.12507 
0.14686 


0.17106 
0.19766 
0.22663 
0.25785 
0.29116 


0.32636 


0.36317 
0.40129 


0.44038 


0.48006 . 


0.51994 
0.55962 
0.59871 
0.63683 
0.67364 


0.70884 
0.74215 
0.77337 
0.80234 
0,82894 


0.85314 
0.87493 
0.89455 
0.91149 
0.92647 


0.93943 
0.95053 
0.95994 
0.96784 
0.97441 


0.07215 
0.08692 


0.10383 


0.12302 
0.14457 


0.16853 
0.19489 
0:22363 


0.25463 


0.28774 


0.32276 
0.35942 
0.39743 
0.43644 
0.47608 


0.52392 
0.56356 
0.60257 
0.64058 
0.67724 


0.71226 
0.74537 
0.77637 
0.80511 
0,83147 


0.85543 


0.87698 
0.89617 
0.91308 


0.92785 


0.94062 
0.95154 
0.96080 
0.96856 


0.97500 


0.07078 
0,08534 
0.10204 


0.12100 


0.14231 


0.16602 
0.19215 
0.22065 
0.25143 
0.28434 


0.31918 
0.35569 
0.39358 
0.43251 
0.47210 


0.52790 - 


0.56749 
0.60642 
0.64431 
0.68082 


0.71566 
0.74857 
0.77935 
0.80785 
0.83398 


0.85769 
0.87900 
0.89796 
0.91466 
0.92922 


0.94179 


0.95254 
0.96164 
0.96926 
0.97558 


0.06944 
0.08379 
0.10027 
0.11900 
0.14007 


0.16354 
0.18943 
0.21770 
0.24825 
0.28096 


0.31561] 
0.35197 
0.39974 
0.42858 
0.463812 


0.53188 
0.57142 
0.61026 
0.64803 
0.68439 


0.71904 
0.75175 
0.78230 
0.81057 
0.83646 


0.85993 
0.88100 
0.89973 
0.91621 
0.93056 


0.94295 
0.95352 
0.96246 
0.96995 
0.97615 


0.09 
0.06811 
0.08226 
0.09853 
0.11702 
O.13786 * 


0.16109 
0.18673 
0.21476 
0.24510 
0.27760 


0.31207 
0.34827 
0.38591 
0.42465 
0.46414 


0.53586 
0.57535 
0.61409 
0.65173 
0.68793: 


0.72240 
0.75490 
0.78524 
0.81327 
0.83891 


0.86214 
0.88298 
0.90147 
0.91774 
0.93189 


0,94.408 
0.95449 
0.96327 
0.97062 
0.97670 


2 Eee 


rete oe. 


=" 
not ol 
i 


Table 9 ( Continued ) 


yh 

2.0 
2.1 
2.2 
2.3 
2.4 


25 
2.6 
2.7 
2.8 
2.9 


3.0 
J.1 
3.2 
3.3 
3.4 


3.5 
3.6 
3.7 
3.8 
3.9 


4.0 
4.1 
4.2 
43 
4.4 


4.5 


0.00 
0.97725 


| 0.98214 
| 0.98610 


0.98928 


1 0.99180 


0.99379 
0.99534 
0.99653 


| 0.99744 
| 0.99813 


0.99865 
0.99903 
0.99931 


| 0.99952 
0.99966 


0.99977 
0.99984 
0.99989 
0.99993 


| 0.99995 


| 0.99997 
0.99998 
0.99999 
0.99999 


0,99999 


1.00000 


0.01 


0.97778 
0.98257 
0.98645 
0.98956 
0.99202 


0.99396 
0,99547 
0.99664 
0.99752 
0.99819 


0.99869 
0.99906 
0.99934 
0.99953 
0.99968 


0,99978 
0.99985 
0.99990 
0.99993 
0.99995 


0.99997 
0.99998 
0.99999 
0.99999 
0.99999 


1.00000 


294 


0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 


0.97831 
0.98300 
0.98679 
0.98983 
0.99224 


0.99413 
0.99560 
0.99674 
0.99760 
0.99825 


0.99874 
0.99910 


0.99936 


0.99955 
0.99969 


0.99978 
0.99985 
0.99990 
0.99993 
0.99996 


0.99997 
0.99998 
0.99999 
0.99999 
1.00000 


1.00000 


0.97882 
0.98341 
0.98713 
0.99010 
0.99245 


0.99430 
0.99573 
0.99683 
0.99767 
0.9983 | 


0.99878 
0.99913 
0.99938 
0.99957 
0.99970 


0.99979 
0.99986 
0.99990 
0.99994 
0.99996 


0.99997 
0.99998 
0.99999 
0.99999 
1.00000 


| .00000 


0.97932 
0.98382 
0.98745 
0.99036 
0.99266 


0.99446 
0.99585 
0.99693 
0.99774 
0.99836 


0.99882 
0.99916 
0.99940 
0.99958 
0.99971 


0.99980 
0.99986 
0.9999 | 
0.99994 
0.99996 


0.99997 
0.99998 
0.99999 
0.99999 
1.00000 


1.00000 


0.97982 
0.98422 
0.98778 
0.9906 | 
0.99286 


0.99461 
0.99598 
0.99702 
0.99781 
0.99841 


0.99886 
0.99918 
0.99942 
0.99960 
0.99972 


0.9998 1 
0.99987 
0.9999] 
0.99994 
0.99996 


0.99997 
0.99998 
0.99999 
0.99999 
1.00000 


1.00000 


0.98030 
0.9846] 
0,98809 
0.99086 
0.99305 


0.99477 
0.29609 
0.99711 
0.99788 
0.99846 


0.99889 
0.99921 
0.99944 
0.99961 
0.99973 


0.99981 
0.99987 
0.99992 
0.99994 
0.99996 


0.99998 
0.99998 
0.99999 
0.99999 


1.00000 1.00000 


1.00000 


0.98077 
0.98500 
0.98840 
0.99111 
0.99324 


0.99492 
0.99621 
0.99720 
0.99795 
0.99851 


0.99893 
0.99924 
0.99946 
0.99962 
0.99974 


0.99982 
0.99988 
0.99992 
0.99995 
0.99996 


0.99998 
0.99998 
0.99999 
0.99999 


1.00000 


0.98124 
0.98537 
0.98870 
0.99134 
0.99343 


0.99506 
0.99632 
0.99728 
0.99801 
0.99856 


0.99896 
0.99926 
0.99948 
0.99964 
0.99975 


0.99983 
0.99988 
0.99992 
0.99995 
0.99997 


0.99998 
0.99999 
0.99999 
0.99999 
1.00000 


1.00000 


0.98169 
0.98574 
0.98899 
0.99158 
0.99361 


0.99520 
0.99643 
0.99736 
0.99807 
0.99861 


0.99900 
0.99929 
0.99950 
0.99965 
0.99976 


0.99983 
0.99989 
0.99992 
0.99995 
0.99997 


0.99998 
0.99999 
0.99999 
0.99999 
1.00000 


1.00000 


Table 10 (a) 


Tabulated Values: 

Pp 0.000 0.001 
0.00 oats 23.0902 
0.01 | -2.3263 -2.2904 
0.02 | -2.0537 -2.0335 
0.03 -1.8808 -1.8663 
0.04 | -1.7507  -1.7392 
0.05 | -1.6449 -1.6352 
0.06 | -1.5548 -1.5464 
0.07 | -1.4758 -1,4684 
0.08 | -1.4051 -1.3984 
0.09 -1.3408 -1.3346 
0.10 | -1.2816 -1.2759 
O.11 | -1.2265 -1.2212 
0.12 | -1.1750 -1.1700 
O13 | -1.1264 -1.1217 
0.14 | -1.0803 -1.0758 
0.15 | -1.0364 -1.0322 
0.16 -0.9945 -0.9904 
0.17 | -0.9542 -0.9502 
0.18 | -0.9154 -0.9116 
0.19 -0.8779 -0,.8742 
0.20 | -0.8416 -0.8381 
0.21 -0,8064  -0.8030 
0.22 | -0.7722 -0.7688 
0.23 | -0.7388 -0.7356 
0.24 -0.7063 -0.7031 
0.25 | -0.6745 -0.6713 
0.26 | -0.6433 -0.6403 
0.27 | -0.6128 -0.6098 
0.28 | -0.5828 -0.5799 
0.29 -0.5534 -0.5505 
0.30 | -0.5244 -0.5215 
0.31 -0.4958 -0,4930 
0.32 | -0.4677 -0.4649 
0.33 -0.4399 -0.4372 
0.34 “0.4125 





295 


Quantiles of the Standard Normal 


‘Distribution 


(Inverse Standard Normal Cumulative Distribution Function) —__ 
ip o-'(p) 


-0.4097 


0.002 


2.0782 
-2.2571 
-2.0141 
- 1.8522 
1.7279 


-1.6258 
“1.5382 
-1.4611 
-1.3917 
-1.3285 


“1.2702 
-1.2160 
-1.1650 
=1.1170 
“1.0714 


-1.0279 
~0.9863 
~0.9463 
-0.9078 
-0.8706 


“0.8345 
-0,7995 
-0.7655 
0.7323 
-0.6999 


“0.6682 
-0,6372 
~0.6068 


“0.5769 ° 


~0.5476 


“0.5187 
-0.4902 
~0.4621 
-0.4344 


-0.4070 


0.003 


-2.7478 
-2.2262 
-1.9954 
-1.8384 
1.7169 


-1.6164 
-1.530] 
-1,4538 
-1.3852 
-1.3225 


-1.2646 
~1,2107 
-1.1601 
“1.1123 
- 1.0669 
- 1.0237 
-0.9822 
-0.9424 
-0.9040 
-0.8669 


“0.8310 
~0.7961 
-0.7621 
-0.7290 
-0.6967 


“0.6651 
-0.6341 
-0.6038 
-0).5740 
-0.5446 


0.5158 
0.4874 
0.4593 
-0.4316 


-0.4043 


Mz) =-P(Z Sz) =-| 


0.004 


-2.6521 
-2.1973 
-1.9774 
1.8250 
- 1.7060 


- 1.6072 
~1.5220 
1.4466 
-1.3787 
-1.3165 


«1.2591 
-1.2055 
-1.1552 
-1.1077 
-1.0625 


1.0194 
-0.9782 
-0.9385 
-0.9002 
0.8632 


-0.8274 
0.7926 
-0.7588 
-0.7257 
-0.6935 
0.6620 
0.6311 
0.6008 
-0.5710 
-0.5417 


-0,5129 
0.4845 
-0,4565 


-0.4289- 


-0.4016 


= 
i 


=— 





21 


oe 2/2 dz = Pp 


0.005 0.006 0.007. 0.008 | 0.009 


-2.5758 
-2,1701 
- 1.9600 
“1.8119 
-1.6954 


=1,5982 


1.5141 
-1,4395 - 


-1.3722 
-1.3106 


“1.2536 
1.2004 
-1,1503 
-1.1031 
-1.0581 


-1,0152 
0.9741 
0,9346 
0.8965 
0.8596 
-0.8239 
-0.7892 
-0,7554 
-0.7225 
-().6903 


“0.6588 
~0,6280 
-0.5978 
-0.568 | 
-0.5388 


0.5101 
-0.4817 
0.4538 
-0.4261 


-0.3989. 


-2.5121- 


-2,1444 
-1.943] 
-1.7991 
- 1.6849 


~ 1.5893 
- 1.5063 
71.4325 
-1.3658 
=1.3047 


1.2481 
-1.1952 
=1,1455 
-1.0985 
-1.0537 


-1.0110 
-0.9701 
-0.9307 
0.8927 
-0.8560 


-0.8204 
-0.7858 
-0.7521 
-0.7192 
-0.687] 


“0.6557 
-0.6250 
-0.5948 


0.5651 


-0,5359 
-0),5072 


-0.4789 
-0.4510 


-0,4234 
-0,3961 


2.4573 
2.1201 
-1.9268 
-1.7866 
-1.6747 


-1.5805 
-1.4985 
-1.4255 
-1.3595 
- 1.2988 


-1.2426 
-1.1901 
-1.1407 
-1.0939 
-1,0494 


- 1.0069 
-0.9661 
-0.9269 
-0.8890 
“0.8524 


0.8169 
0.7824 
0.7488 
0.7160 
-0.6840 


-().6526 
-0.6219 
-0.5918 
-0.5622 


0.5330 


-0,.5044 
-0.476! 
0.4482 
-0.4207 
-0.3934 


-2,4089 
2.0969 
1.9110 
-1.7744 
- 1.6646 


-1.5718 


-1.4909 
-1,4187 
-1.3532 
~1.2930 


-1.2372 
-1.1850 
-1.1359 
-1,0893 
-1,0451 


- 1.0027 
-0.9621 


- -0,9230 


~().8853 
“0.8488 


0.8134 
-0.7790 
0.7454 
-0.7128 
-().6808 


“0.6495 
-0.6189 


0.5838 


-0.5592 
-0.5302 


-0.5015 
-0.4733 
-0.4454 


_ 0.4179 


~0.3907 


2.3656 
-2.0748 
-1.8957 
-1.7624 
-1.6546 


- 1.5632 
-1.4833 
-E4118 
~1,3469 
1.2873 


-1.2319 
-1.1800 
-LI311 
-1,0848 
-1.0407 


-0.9986. 
-0.9581 
-0.9192 
-0.8816 
0.8452 
-0.8099 
-0.7756 
0.7421 
-0.7095 
-0.6776 


~0.6464 
~0.6158 
-0.5858 
-0.5563 
-0.5273 


-0.4987 
-0.4705 
-0.4427 © 
-0.4152 
-0.3880 


Table 10 (a )( Continued ) 
0.000 0.001 0.002 


P 


0.35 
0.36 
0.37 
0.38 
0.39 


0.40 
0.41 
0.42 
0.43 


0.45 


0.47 
0.48 
0.49 
0.50 


0.52 


0.61 
0.62 
0.63 


0.65 
0.66 
0.67 
0.68 
0.69 


0.70 
0.71 
0.72 
0.73 
0.74 


0.3853 


} -0.3585 


-0.3319 
0.3055 
-0.2793 


0.2533 
-0.2275 
“0.2019 
-0.1764 
-0,1510 


} -0.1257 


-0.1004 


| -0.0753 


“0.0502 
-0.0251 


0.0000 
0.0251 


} 0.0502 


0.0753 
0.1004 


0.1257 
0.1510 


| 0.1764 
| 0.2019 
0.2275 


| 0.2533 
| 0.2793 


0.3055 
"0.3319 
0.3585 


| 0.3853 
| 0.4125 


0.4399 
0.4677 
0.4958 


0.5244 
0.5534 


| 0.5828 


0.6128 
0.6433 


-0.3826 
“0.3558 
-0.3292 
-(),3029 
~0.2767 


0.2508 
-0.2250 
“0.1993 
-0.1738 
“0.1484 


-0.1231 
0.0979 
-0.0728 
-0.0476 
-0.0226 


0.0025 
0.0276 
0.0527 
0.0778 
0.1030 


0.1282 
0.1535 
0).1789 
0.2045 
0.2301 


0.2559 
0.2819 
0.3081 
0.3345 
0.3611 


0.3880 
0.4152 
0.4427 
0.4705 
0.4987 


0.5273 
0.5563 
0.5858 
0.6158 
0.6464 


-0.3799 
-0.353 1 
“03.266 
-0.3002 
-),274] 


0.2482 
-0).2224 
-0.1968 
“0.1713 
-0.1459 


-0.1206 
-0.0954 
“0.0702 
-0.0451 
-0.0201 


0.0050 
0.0301 
0.0552 
0.0803 
0.1055 


0.1307 
0.1560 
0.1815 
0.2070 
0.2327 


0.2585 
0.2845 
0.3107 
0.3372 
0.3638 


0.3907 
0.4179 
0.4454 
0.4733 
0.5015 


05302 
0.5592 
0.5888 
0.6189 
0.6495 


0.003 
“0.3772 
-0.3505 
-0,3239 
-0).2976 
“0.2715 


-(0).2456 
“0.2198 
-0.1942 
-0.1687 
-0.1434 


“0.1181 
-0.0929 
-0,0677 
~0.0426 
0.0175 


0.0075 
0.0326 
0.0577 
0.0828 
0.1080 


0.1332 
0.1586 
0.1840 
0.2096 


0.2353- 


0.2611 
0.2871 
0.3134 
0.3398 
0.3665 


0.3934 
0.4207 
0.4482 
0.4761 
0.5044 


0.5330 
0.5622 
0.5918 
0.6219 
0.6526 


296 


0.004" 
-0.3745 


-0,3478 
-0.3213 
-0).2950 
~0.2689 


“(0.2430 
-0.2173 
“0.1917 
-0.1662 
-0.1408 


-0.1156 
-0.0904 
-0.0652 
-0,0401 
-0.0150 


0.0100 
0.0351 
0.0602 
0.0853 
0.1105 


0.1358 
0.1611 
0.1866 
0.2121 
0.2378 


0.2637 
0.2898 
0.3160 
0.3425 
0.3692 


0.3961 
0.4234 
0.4510 
0.4789 
0.5072 


0.5359 
0.5651 
0.5948 
0.6250 


0.6557 


-0,.3719 
-0.3451 


“0.3186 


0.2924 
-0).2663 


-0.2404 
-().2147 
-0.1891 
“0.1637 
-0.1383 


“0.1130 
-0.0878 
-0.0627 
-0.0376 
-0.0125 


0.0125 
0.0376 
0.0627 
0.0878 
0.1130 


0.1383 
0.1637 
0.1891 
0.2147 
0.2404 


0.2663 
0.2924 
0.3186 
0.345] 
0.3719 


0.3989 
0.4261 
0.4538 
0.4817 
0.5101 


0.5388 
0.5681 
0.5978 
0.6280 
0.6588 


-0),3692 
0.3425 
-().3160 
“0.2898 
-0).2637 


-0.2378 
-0.2121 
-0.1866 
-0.1611 
“0.1358 


-0.1105 
-0.0853 
-0.0602 
-0.0351 
-0.0100 


0.0150 
0.0401 
0.0652 
0.0904 
0.1156 


0.1408 
0.1662 
0.1917 
0.2173 
0.2430 


0.2689 
0.2950 
0.3213 
0.3478 
0,3745 


0.4016 
0.4289 
0.4565 
0.4845 
0.5129 


0,5417 
0.5710 
0.6008 
0.6311 
0.6620 


-0.3665 
“0.3398 
“0.3134 
“0.2871 
-0.261 | 


-0.2353 
-0,2096 
-0.1840 
-0,1586 
-0,1332 


-0.1080 
-0.0828 
-0.0577 
-0.0326 
-0.0075 


0.0175 
0.0426 
0.0677 
0.0929 
0.1181 


0.1434 
0.1687 
0.1942 
0.2198 
0.2456 


0.2715 
0.2976 
0.3239 
0.3505 
0.3772 


0.4043 
0.4316 
0.4593 
0.4874 
0.5158 


0.5446 
0.5740 — 


0.6038 
0.6341 
0.6651 


-0.3638 
-0.3372 
-0.3107 
0.2845 
“0.2585 


-0.2327 
-).2070 
“O.1815 
“0.1560 
-0.1307 


-Q.1055 
-0.0803 
0.0552 
-0.0301 
-0.0050 


0.0201 
0.0451 
0.0702 
0.0954 
0.1206 


0.1459 
0.1713 
0.1968 
0.2224 
0.2482 


0,274] 
0.3002 
0.3266 
0.3531 
0.3799 


0.4070 
0.4344 
0.4621 
0.4902 
0.5187 


0.5476 
0.5769 
0.6068 
0.6372 


0.6682 


‘70.005 - 0.006 © 0.007- 0.008. 0.009 


-0,3611 
~0.3345 
-0.308 1 
-0.2819 
-0.2559 


-0.2301 
-0.2045 
-0.1789 - 
-0.1535 
-0.1282 


-0,1030 
-0.0778 
-0,0527 
-0.0276 
-0.0025 


0.0226 
0.0476 
0.0728 
0.0979 
0.1231 


0.1484 
0.1738 
0.1993 
0.2250 
0.2508 


0.2767 
0.3029 
0.3292 
0.3558 
0.3826 


0.4097 
0.4372 
0.4649 
0.4930 
0.5215 


0.5505 
0.5799 
0.6098 
0.6403 
0.6713 





p 0.000 
0.75 0.6745 
0.76 0.7063 
0.77 | -0.7388 
0.78 0.7722 
0.79 | 0.8064 
0.80 0.8416 
0.81 0.8779 
0.82 | 0.9154 
0.83 0.9542 
0.84 0.9945 
0.85 1.0364 
0.36 1.0803 
0.87 1.1264 
0.88 1.1750 
0.89 | 1.2265 
0.90 1.2816 
0.91 1.3408 
0.92 1.4051 
0.93 1.4758 
0.94 1.5548 
0.95 1.6449 
0.96 1.7507 
0.97 1.8808 
0.98 2.0537 
0.99 | 2.3263 
1.00 co 
Table 10 (b) 


0.001 


0.6776 
0.7095 
0.7421 
0.7756 
0.8099 


0.8452 
0.8816 
0.9192 
0.9581 
0.9986 


1.0407 
1.0848 
1.1311 
1.1800 
1.2319 


1.2873 
1.3469 
1.4118 
1.4833 
1.5632 


1.6546 
1.7624 
1.8957 
2.0748 
2.3656 


0. 002 


0.6808 
0.7128 
0.7454 
0.7790 
0.8134 


0.8488 
0.8853 
0.9230 
0.9621 
1.0027 


1.0451 
1.0893 
1.1359 
1.1850 
1.2372 


1.2930 
1.3532 
1.4187 
1.4909 
1.5718 


1.6646 
1.7744 
1.9110 
2.0969 
2.4089 


0.003 


0.6840 


0.7160 
0.7488 
0.7824 
0.8169 


0.8524 
0.8890 
0.9269 
0.9661 
1.0069 


1.0494 
1,0939 
1.1407 
1.1901 
1.2426 


1.2988 
1.3595 
1.4255 
1.4985 
1.5805 


1.6747 
1.7866 
1.9268 
2.1201 
2.4573 


297 
Table 10 (a) ( Continued ) and Table 10 ( b) 


0.004 


0.6871 
0.7192 
0.752] 
0.7858 
0.8204 


0.8560 


0.8927 
0,9307 
0.9701 
1.0110 


1.0537 
1.0985 
1,1455 
1.1952 
1.2481 


1.3047 
1.3658 
1.4325 
1.5063 
1.5893 


1.6849 
1.7991 
1,9431 
2.1444 
2.5121 


0.005 


0.6903 
0.7225 
0.7554 
0.7892 
0.8239 


0.8596 
0.8965 
0.9346 
0.9741 
1.0152 


1.0581 
1.1031 
1.1503 
1.2004 
1.2536 


1.3106 
1.3722 
1.4395 
1.3141 
1.5982 


1.6954 
1.8119 
1.9600 
2.1701 


‘2.5758 


0.006 


0.6935 
0.7257 
0.7588 
0,7926 
0.8274 


0.8632 
0.9002 
0.9385 
0.9782 
1.0194 


1.0625 
1.1077 
1.1552 
1.2055 
1.2591 


1.3165 
1.3787 
1.4466 
1.5220 
1.6072 


1.7060 
1.8250 
1.9774 
2.1973 
2.6521 


0.007 


0.6967 
0.7290 
0.7621 
0,7961 
0.8310 


0.8669 


0.9040 
0.9424 
0.9822 
1.0237 


1.0669 
1.1123 
1.1601 
1.2107 
1.2646 


1.3225 
1.3852 
1.4538 
1.5301 
1.6164 


1.7169 
1.8384 
1.9954 
2.2262 
2.7478 


0.008 


0.6999 
0.7323 
0.7655 
0.7995 
0.8345 


0.8706 
0.9078 
0.9463 
0.9863 
1.0279 


1.0714 
1.1170 
1.1650 
1.2160 
1.2702 


1.3285 
1.3917 
1.4611 
1.5382 
1.6258 


1.7279 
1.8522 
2.0141 
2.2571 
2.8782 


0.009 


0.7031 
0.7356 
0.7688 
0.8030 
0.8381 


0.8742 
0.9116 
0.9502 
0.9904 
1.0322 


1.0758 
1.1217 
1.1700 
1.2212 
1.2759 


1.3346 
1.3984 
1.4684 
1.5464 
1.6352 


1.7392 - 
1.8663 
2.0335 
2.2904 
3.0902 


Specific Quantiles of the Standard Normal Distribution 





=f 


0.05 


ee dz = p 


2m 


0.10 


0.20 


0.30 


0.40 


-3.291 -3.090 72.576 2.325 -1.960 -1.645 -1.282 "0.842 0.524 0.253 
0.95 0.975 0.990 0.995 0.999 0.9995 


Tabulated Values: z= ®-!(p) 
O(z) = P(ZSz) = 
p 0.0005 0.001 0.005 0.01 0.025 
<p 
p | 060 0.70 080 0.90 
Zp | 0.253 0.524 0.842 1.282 1.645 


—————— 


1.960 2.326 2.576 3.090 3.291 


298 
Table 11 


Quantiles of the Chi-square Distributions 


Tabulated Values: as “ 


:P 


Fg? = Paz Sx?) = | f(x?) dx? = p 


2 2 2 2 2 2 2 2 
VY | %o00s %Xo010 %0.025 Xo.050 Xér00 Xéeo00 %0950 Xo97s Xo990 %0.995 


| 


1; 000 000 000 000 002 271 384 502 663 7.88 
Zale O.0leen 0.020.007 0105 O20 461° 5:99 738 9:21 10.60 
SOU OL eee O.228 0,59) 10.58 ~6:25 "7:81 935 ~ 11.34 12:84 
4) 0.21 030 048 0.71 106 7.78 949 11.14 13.28 14.86 
S$} 041 O55 . 083 1.15 161 9.24 11.07 12.83 15.09 16.75 
6; 068 087 124 164 %$%2.20 1064 1259 1445 16.81 18.55 
7} 099 124 169 217 2.83 12.02 1407 16.01 1848 20.28 
Soll seme l.Oeee 2160 2.75. 3.49. 13:36 15:51 17:53 20.09 21.95 
Tele Ooee 210) 3.55) 4:17 14:68» 16:92 19.02 21.67 23.59 


10} 2.16 256 325 3.94 487 15.99 1831 20.48 23.21 -25.19 


SUS: GOm 5.058 5.825 4.57, 5:58 1728 19.68 21.92 24.73 26.76 
12} 3.07 3.57 440 5.23 630 1855 21.03 23.34 26.22 28.30 
13 | 3.57 411 501 589 7.04 19.81 22.36 24.74 27.69 29.82 
14| 407 4.66 5.63 657 7.79 21.06 23.68 26.12 29.14 31.32 
15| 460 5.23 626 7.26 8.55 22.31 25.00 27.49 30.58 32.80 


16| 5.14 581 691 7.96 9.31 23.54 2630 2885 32.00 34.27 


17 | 5.70. 641 7.56 8.67- 10.09 24.77 27.59 30.19 33.41 35.72 

i 18 | 626° 7.01 823 9.39 10.86 25.99 28.87 31.53 34.81 37.16 

19| 684 7.63 891 10.12 11.65 27.20 3014 3285 36.19 38.58 

20} 7.43 8.26 9.59 10.85 12.44 2841 31.41 34.17 37.57 40.00 

21| 8.03 890 1028 11.59 13.24 29.62 32.67 35.48 38,93 41.40 

22| 8.64 9.54 1098 12.34 14.04 30.81 33.92 36.78 40.29 42.80 - 

23. | 9.26 10.20 11.69 13.09 14.85 32.01 35.17 38.08 41.64 44.18 

2 274) 9.89 1086 1240 13.85 15.66 33.20 36.42 39.36 42.98 45.56 
25 


10.52 11.52 13.12 1461 1647 3438 37.65 4065 44.31 46.93 





Table 11 ( Continued ) 


Vv 


26 
27 
28 


29 | 


30 


32 
33 
34 


35 | 


36 
37 
38 
39 
40 


45 
50 


aS th 


70 


S&S 


95 


100 





31 


2 
X 0.005 


11.16 
11.81 
12.46 
13.12 
13.79 


14.46 
15.13 
15.82 
16.50 
17.19 


17.89 
18.59 
19,29 
20.00 


| 20.71 


24.31 
27.99 
31.73 
35:35 


| 39.38 


43.28 
47.21 
51.17 
55.17 
39.20 


| 63.25 
| 67.33 


2 
X0.010 


12.20 
12.88 
13.56 
14.26 
14.95 


15.66 
16.36 
17.07 
17.79 
18.51 


19.23 
19.96 
20.69 
21.43 
22.16 


25.90 
29.71 
‘33.57 
37.48 
41.44 


45.44 
49.48 
53.54 
37.63 
61.75 


65.90 
70.06 


Xoons 
13.84 
14.57 
15.31 
16,05 
16.79 


17.54 
18.29 
19.05 
19.81 
20.57 


21.34 
22.11 
22.88 
23.65 
24.43 


28.37 
32.36 
36.40 
40.48 
44.60 


48.76 
52.94 
57.15 
61.39 
65.65 


69.92 
74.22 


2 
X0.050 


15.38 
16.15 
16.93 
17.71 
18.49 


19.28 
20.07 
20.87 
21.66 
22.47 


23.27 
24.07 
24.88 
25.70 
26.51 


30.61 
34,76 
38.96 
43.19 
47.45 


51.74 
56.05 
60.39 
64.75 
69.13 


73.52 
77.93 


299 


2 
X0.100 


17.29 
18.11 
18.94 
19.77 
20.60 


21.43 
22.217 
23.11 
23.95 
24.80 


25.64 
26.49 
27.34 
28.20 
29.05 


33.35 
37.69 
42.06 
46.46 
50.88 


55.33 
59.79 
64.28 
68.78 
73.29 


77.82 


82.36 


X6.900 
35.56 
36.74 
37.92 
39.09 
40.26 


41.42 
42.58 
43.75 
44.90 
46.06 


47.2] 
48.36 
49.51 
50.66 
51.81 


57.51 
63.17 
68.80 
74.40 
79.97 


X6.950 
38.89 
40.11 
41.34 
42.56 
43.77 


44.99 
46.19 
47.40 
48.60 
49.80 


51.00 
52.19 
53.38 
54,57 
55.76 


61.66 
67.50 
73.31 
79.08 
84.82 


2 
X0.975 


41.92 


43.19. 


44.46 
45.72 
46.98 


48.23 
49.48 
50.73 
51.97 
53.20 


54.44 
55.67 
56.90 
38.12 
59.34 


65.41 
71.42 
77.38 
83.30 


89.18” 


95.02 
100.84 
106.63 
112,39 
118.14 


123.86 


129.56 





2 
X0.990 


45.64 


46.96 


48.28 
49.59 
50.89 


52.19 
53.49 
54.78 
56.06 
57.34 


58.62 
59.89 
61.16 
62.43 
63.69 


69.96 
76.15 
82.29 
88.38 
94.42 


100.43 
106.39 


2 
X0.995 


48.29 
49.65 
50.99 
52.34 
53.67 


55.00 
56.33 
57.65 
58.96 
60.27 


61.58 
62.88 
64.18 
65.48 
66.77 


73.17 
79.49 
85.75 
91.95 
938.10 


104.21 
110.29 


112.33 116.32 


118.24 


122.32 


124.12 128.30 


129.97 134.25 


For valuesof vy > 100, the quantity 2 7° may be taken normally distributed 


about mean V2 v —1 with unit variance. 


135.81 140.17 









300 
Quantils of the Student's ¢ -Distributions 
: Tabulated Values: t. : t 


Table 12 


, 


BSERrmnaA ARON = SCMIUA HeaWUNn= 


hea 
— 





8S Sseesu ssase SeyyReNen 





fo.0005 


~636.58 
-31.600 
-12.924 
-8.610 
-6.869 


-5.959 
-5.408 
5.041 
4.781 
4.587 


4.437 
4318 
4.221 
~4.140 
-4.073 


-4.015 
3.965 
-3.922 
-3.883 


3.850. 


-3.819 
-3.792 
~3.768 
-3.745 
'-3.725 


3.707 
~3.689 
-3.674 
3.060 


-3.646 


-3.591 


-3.551 - 


-3.520 
-3.496 
-3.460 


-3.435 
-3.416 
~3,402 
-3.390 
-3.340 


-3.310 
-3.300 
-3.290 


‘9.001 


-318.29 

~22.328 
-10.214 
-7.173 
-5.894 


5,208 


4.785 
4.501 
4.297 
4.144 


-4.025 
-3.930 
~3.852 
-3.787 
-3.733 


-3.686 
3.646 
-3.610 
~3.579 
-3.552 


-3.527 
-3.505 
~3.485 
-3.467 
-3.450 


-3.435 
*-3.421 
-3,408 
~3.396 
~3.385 


-3.340 
3.307 
-3.281 
3.261 
-3.232 


3211 


-3.195 


-3.183 


~3.174 
3.131 


-3.107 
-3.098 
-3.090 


9.005 


-63.656 
-9.925 
5.84] 
-4.604 
-4.032 


-3.707 
~3.499 
-3.355 
-3.250 
-3.169 


-3.106 
-3.055 
~3.012 
-2.977 
~2.947 


-2,921 
-2.898 
-2.878 
-2.861 
-2,845 


-2.831 
2,819 
-2.807 
2.797 
~2.787 


2.779 
2.771 
-2.763 
~2.756 
-2.750 


-2.724 
-2.704 
-2.690 
-2.678 
-2.660 


2.648 
-2,639 
-2.632 
~2.626 
2.601 


-2.586 


-2.581 
-2.576 


fo.010 


~31.821 
-6.965 
4.541 
-3,747 
3.365 


~3.143 
-2.998 
-2.896 
-2.821 
~2.764 


-2.718 
-2.681 
-2.650 
-2.624 
2,602 


-2.583 
-2.567 
~2.552 
-2,539 
-2.528 


-2.518 
-2.508 
-2.500 
-2.492 
~2.485 


-2.479 
-2.473 
-2.467 
-2,462 
2.457 


-2.438 
-2,423 
2.412 
-2.403 
2.390 


-2.381 


*-2.374 


~2.368 
-2.364 
2.345 


2.334 


-2.330 
-2.326 


19.025 


-12.706 
-4,303 
-3.182 
-2.776 
-2.571 


-2.447 
-2,365 
-2.306 
-2.262 
-2.228 


-2.201 
-2.179 
-2.160 
-2.145 
-2.131 


-2.120 
-2.110 
-2.101 
-2.093 
~2.086 


-2.080 
-2,074 
-2.069 
~-2.064 
-2.060 


-2.056 
-2.052 
-2.048 
~2.045 
~2,042 


-2.030 
-2.021 
~-2.014 
-2.009 
-2.000 


-1.994 
-1.990 
-1.987 
~1.984 


~ -1.972 


-1.965 
-1.962 
-1,960 


—coo 


0.0050 


-6.314 
-2.920 
-2,.353 
-2.132 


-2.015 


-1.943 
1.895 
-1.860 
-1.833 
~1.812 


-1.796 
-1.782 
-1.771 
-1.761 
~1.753 


-1.746 
-1.740 
-1.734 
1.729 
-1.725 


“1.721 
-1.717 
-1.714 


“1.711 . 


-1.708 


-1.706 
-1.703 
-1.701 
-1.699 
~1.697 


-1,690 
-1.684 
-1.679 
-1.676 
-1.671 


1.667 


 -1,664 


~1.662 
-1.660 
-1.653 


-1.648 
~1.646 
-1.645 


Vip 
Ey) SPT Sty. )i= | f(t) dt = p 


{9.100 


-3,078 
-1.886 
-1,638 
-1.533 
-1.476 


-1,440 
-1.415 
1.397 
-1.383 
-1.372 


-1.363 
-1.356 
-1.350 
-1.345 
-1,341 


-1.337 
-1.333 
-1.330 
-1.328 
-1.325 


-1.323 
-1.321 
-1.319 
-1.318 
-1.316 


-1.315 
-1,314 
-1.313 
-1.311 
1.310 


-1.306 
-1.303 
-1.301 
-1.299 
-1.296 


-1.294 
-1.292 
1.291 
-1.290 
-1.286 
-1.283 


~1,282 
-1.282 


9.200 


-1.376 
-1.061 
-0.978 
-0.94] 
-0.920 


-0.906 
-0.896 
-0.889 
-0.883 
-0.879 


-0.876 
-0.873 
-0.870 
-0.868 
-0.866 


-0.865 
-0.863 
-0.862 
-0.861 
-0.860 


-0.859 
~0.858 
~-0.858 
-0.857 
-0.856 


-0.856 
-0.855 
~0.855 
-0.854 
-0.854 


-0,852 
-0.851 
-0.850 
-0.849 
-0.848 


-0.847 
-0.846 
-0.846 


0.845 


-0.843 


0.842 


-0.842 
-0.842 


Co er 2°82 PEE 2° Secees « Ot oo 


ee . Ce 8 ce re er ee we + ee ee 


Table 12 ( Continued ) 


Vs} !o.800 


1.376 
1,061 
0.978 
0.941 
0.920 
0.906 
0.896 
0.889 
0.883 
0.879 


0.876 
0.873 
0.870 
0.868 
0.866 


0.865 
0.863 
0.862 
0.861 
0.860 


0.859 
0.858 
0.858 
0.857 
0.856 


21 
22 
23 
24 
25 
26 0.856 
27 
28 
29 
3» 


Cerna wav bv = 


_ 
Oo 


ull alll antl onthe! 
“Aa wn = 


_ 
i=.) 


SSas 


0.855 
0.855 
0.854 
0.854 


35 0.852 
40 0.851 
45 0.850 
50 0.849 
60 0.848 


70 | 0.847 
0.846 

90 | 0.846 
100 | 0.845 
200 | 0.843 
500 0.842 
ooo | 0.842 


co 0.842 


'9.900 


3.078 
1.886 
1.638 
7.000 
1.476 


1.440 
1.415 
1.397 
1.383 
1.372 


1.363 
1.356 
1.350 
1.345 
1.341 


1.337 
1.333 
1.330 
1.328 
1.325 


1.323 
1.321 
1.319 
1.318 
1.316 


1.315 
1.314 
1,313 
1.311 
1.310 


1.306 . 


1.303 
1.30] 
1.299 
1.296 


1,294 
1.292 
1.291 
1.290 
1.286 


1.283 
1.282 


19.950 


6.314 
2.920 
2.353 
2.132 
2.015 


1,943 
1.895 
1.860 
1.833 
1.812 


1.796 
1.782 
1,771 
1.761 
1.753 


1.746 
1.740 
1.734 
1.729 
1.725 


1.721 
1.717 
1.714 
1711 
1.708 


1.706 
1,703 
1.701 
1.699 
1.697 


1,690 
1,684 
1.679 
1.676 
1.671 


1.667 


1.664 


1.662 
1.660 
1.653 


1,648 


301 


10.975 


12.706 
4.303 
3.182 
2.776 
2.571 


2.447 
2.365 
2.306 
2.262 
2.228 


2.201 
2.179 
2.160 
2.145 
2.131 


2.120 
2.110 
2.101 
2.093 
2.086 


2.080 
2.074 
2.069 
2.064 
2.060 


2.056 
2.052 
2.048 
2.045 
2.042 


2.030 
2.021 


2.014 


2.009 
2.000 


1,994 
1.990 
1.987 
1.984 
1,972 


1.965 


19.990 


31.821 
6.965 
4.54] 
3.747 
3.365 


3.143 
2.998 
2.896 
2.821 
2.764 


2.718 
2.681 
2.650 
2.624 
2.602 


2.583 
2.567 
2.552 
2.539 
2.528 


2.518 
2.508 
2.500 
2.492 
2.485 


2.479 
2.473 
2.467 
2.462 
2.457 


2.438 
2.423 
2.412 
2.403 
2.390 


2.381 
2.374 
2,368 
2.364 
2.345 


RK oe 
2.330 


£9995 


63.656 
9.925 
5.841 
4.604 
4.032 


3.707 
3.499 
3.355 
3.250 
3.169 


3.106 
3.055 
3.012 
2.977 
2.947 


2.921 
2.898 
2.878 
2.861 
2.845 


2.831 
2.819 
2.807 
2.797 
2.787 


2.779 
2.771 
2.763 
2.756 
2.750 


2.724 
2.704 
2.690 
2.678 
2,660 


2.648 
2.639 
2.632 
2.626 
2.601 


£9999 


318.29 
22.328 
10.214 
7.173 
5.894 


5.208 
4.785 
4.501 
4.297 
4.144 


4.025 
3.930 
3.852 
3.787 
3.733 


3.686 
3.646 
3,610 
3.579 
3.552 


3.527 
3.505 
3.485 
3.467 
3.450 


3.435 
3.421 
3.408 
3.396 
3.385 


19.9995 


636.58 
31.600 
_ 12.924 
8.610 
6.869 


5.959 
5.408 
5.041 
4.781 
4.587 


4.437 
4.318 
4.221 
4.140 
4.073 


4.015 
3.965 
3,922 
3.883 
3.850 


3.819 
3.792 
3.768 
3.745 
3.725 


3.707 


3.674 




















Table 15 


asat ibs 
63523 
02991 
89738 
87774 


536926 
89332 
8858 0 
97687 
92602 


08978 
83206 
91653 
Tielke 
10635 


72693 
04689 
16625 
79042 
43350 


20939 
534930 
91642 
83634 


—6448 621 


58239 
57648 
38550 
92158 
18829 


99255 
29474 
81283 
37412 
24643 





= 


60012 
40697 
62570 
36786 
17396 


46055 
91375 
42057 
24984 
28542 


70087 
$6175 
10865 
$7801 
61930 


EES) tft 
64783 
91809 
24811 
35640 


29266 
68981 


99423, 


32671 
89555 


31378 
69646 
65767 
97699 
30585 


09599 
79815 
42417 
11624 


29187 


73169 
91282 
65237 
01206 
34160 


16021 
67039 
25479 
39260 
04512 


20563 
36552 
67832 
66190 
15186 


04634 
67992 
65745 
2828 1 
42482 


13007 
435638 
82242 
64868 
03308 


8165 2 
74784 
32579 
48018 
69139 


34793 
38182 
68463 
18164 
23665 


302 


Random Digits 


( Blocked merely for convenience ) _ 


63212 
68120 
61795 
539202 
537570 


85746 
94735 
30229 
43521 
94980 


97243 
47171 
48586 
83858 
68532 


18810 
33700 
82325 
35789 
67223 


20091 
90781 
10390 
80365 
714958 


03074 
66776 
25103 
89953 
67181 


81877 
91065 
70505 
05530 
01038 


46285 
94872 
9579 1 
46772 
01756 


92962 
42556 
33192 
09995 
LAL OES 


00593 
40389 
80390 
45838 
76689 


41336 
46813 
62054 
40755 
11246 


64197 
25282 
70643 
80733 
37645 


77526 
27930 
67376 
76630 
09902 


06984 
46339 
99724 
01983 
64170 


975342 
73262 
43326 
99176 
60142 


Til 0:29 
S1:9° 135 
50649 
888 76 
37019 


33371 
17157 
36501 
24976 
97944 


82463 
84656 
17324 
33337 
78154 


16443 
96238 
44658 
64605 
40607 


91014 
04533 
60452 
23043 
09218 


19701 
435712 
67320 
362 8&6 
530433 


21616 
22126 
80040 
$7792 
80909 


36479 
lr235 
24974 
47985 
20743 


94475 
34321 
29560 
47020 
10970 


54823 
72103 
78129 
99974 
76075 


97779 
5§6215 
97192 
90427 
59152 


68936 
14589 
48108 
95199 
36717 


89401 
74215 
USB) 
LOSS TIE 2 
88716 


74487 
93500 
72265 
79475 
01918 


99619 
39872 
O2 35352 
07695 
50808 


208 63 
s2:6 il 
17288 
95085 
42695 


98263 
41266 
34381 
85223 
37544 


14942 
29242 
23376 
PPE TAY 
64219 


60941 
0-951°5)3 
80838 
06153 
20011 


714699 
SPATE EM 
60426 
07252 
47354 


Table 15 (Continued ) 


303 


Random Digits 





ahah mee) | 
29:29 2 
34172 
335867 
76360 


50418 
29733 
38103 
04223 
40722 


26099 
95426 
40001 
31005 
96511 


20508 
38727 
23972 
63236 
Q1115 


713.95 2 
21570 
64788 
76881 
47005 


42034 
55862 
37581 
68477 
09070 


35665 
24142 
65680 
92463 
18327 


48095 
1EG225175 
04973 
34715 
72976 


36528 
41389 
925°5 6:1 
99352 
28253 


76287 
82300 
45549 
19057 
88634 


44694 
SeaUrle2 
68014 
L2G 
38537 


21090 
11526 
76012 
06219 
54849 
83668 
47528 
03756 
70249 
04575 


61707 
31078 
88011 
28512 
353770 


01630 
51040 
01793 
7 AE Rs a a 
01760 


692687 
diel Ai ph Bt 
49260 
0907 3 
fet bea hd ATE 


435241 
16743 
61549 
68167 
L399 2 


eo ly209 
47645 
93:3 1°35 
94061 
32434 


74575 
01095 
20007 
22328 
32483 


89405 
68543 
61884 
61999 
fe) Pl 7 


46988 
68713 
55416 
10051 
74715 


Dif A al CS 
99934 
67995 
80668 


44282 


29935 1 
35630 
82008 
$4463 
57504 


45147 
18001 
S129 7 
S39) 1197 
86894 


$5237 
30384 
88123 
SG 20027 
18506 


92675 
54467 
16220 
18374 
71661 


40782 
45201 
rey ed aPC) 
98149 
90053 


S97 Shao 
95984 
41472 
56376 
66557 


44739 
74608 
25390 
84573 
O Sisal 


82973 
93351 
20327 
59064 
91017 


88001 
39763 
05432 
[3:2353 
36575 


PRS RPE PE 
11018 
16285 
07163 
77530 


97835 
74764 
69643. 
33474 
49193 


37789 
56625 
39380 
38343 
49422 


76487 
10349 
85925 
11857 
Yl tee 


263 12 
96802 
83861 
36067 
pede Ney) 


27655 
77534 
25049 
76376 
§ 8325 


28415 
98192 
926862 
DT inte 
10265 


86908 
20368 
86222 
28134 
79390 


36745 
TO S99 
17339 
ft foot LL Ps) 


80388. 


58788 
11720 
05006 
48367 
19690 


33611 
92823 
nasi ted 
62148 
21912 


68345 
38402 
31324 
pA) | Ie. 
12183 


ep abs) 
65027 
3193320) 
06764 
69274 


21056 
43383 
62538 
21658 
71836 


88233 
31151 
34871 
15994 
71266 


68946 
35026 
18589 
11584 
75115 


534635 
06488 
52357 
29498 


69543. 


05324 
43623 
52339 
70168 
17073 


68221 
73813 
22329 
95681 
09803 


11430 
379.8 5/3 
14642 
66908 
66864 


88190 
95093 
80338 
45135 
44024 


06921 
38065 
09893 
70102 
70547 


29473 
51879 
43843 
20068 
77841 


27077 
21503 
90889 
70442 
09691 


72969 
26740 
58849 
10288 
06014 








Table 15 


51111 
63523 
02991 
89738 
87774 


56926 
89382 
8858 0 
97687 
92602 


08978 
83206 
91653 
UT esr 
10635 


72693 
04689 
16625 
79042 
43350 


20939 
54980 
91642 
83634 
48621 


58239 
537648 
38550 
92158 
18829 


99255 
29474 
$1283 
37412 
24643 


a 


60012 
40697 
62570 
36786 
17396 


46055 
AE RURS I fis. 
42057 
24984 
28542 


70087 
86175 
10865 
87801 
61930 


93372 
64783 
91809 
24811 
35640 


29266 
68981 
99423 
S207) 
89555 


31378 
69646 
65767 
97699 
30585 


09599 
7.9815 
42417 
11624 
29187 


53169 
91282 
Go029357 
01206 
34160 


16021 
67039 
25479 
39260 
04512 


20563 
36552 
67832 
66190 
15186 


04634 
67992 
65745 
2828 1 
42482 


13007 
45638 
$2242 
64868 
03308 


81652 
74784 
52579 
48018 
69139 


34793 
538182 
68463 
18164 
23665 


302 


Random Digits 


( Blocked merely for convenience ) 


63212 
68120 
61795 
59202 
57570 


85746 
94735 
3 05252;9 
43321 
949890 


97243 
47171 
48586 
83858 
68532 


18810 
33700 
$2325 
35789 
67223 


20091 
90781 
10390 
80365 
74958 


03074 
66776 
25103 
89953 
67181 


81877 
91065 
70505 
05530 
01038 


46285 
94872 
95791 
46772 
01756 


92962 
42556 
3931/92 
09995 
1s7s97159 


00593 
40389 
80390 
45838 
76689 


41336 
46813 
62054 
40755 
11246 


64197 
25282 
70643 
80733 
37645 


17526 


27930 
67376 
76630 
09902 


06984 
46339 
99724 
01983. 


64170 


37542 
73262 
43326 
99176 
60142 


fa O12 
31913 
50649 
88876 
37019 


33371 
17157 
36501 
24976 
97944 


82463 
84656 
17324 
£) SSPE 
78154 


16443 
96238 
44658 
64605 
40607 


91014 
04533 
60452 
23043 
09218 


19701 
45712 
67320 
36286 
50433 


21616 
22-126 
80040 
7 7:9°2 
80909 


36479 
a Eat Aes es 
24974 
47985 
2074 3 


94475 
34321 
29560 
47020 
10970 


54823 
fie 48 ES | 
78129 
99974 
76075 


OT iiag 
56215 
97192 
90427 
able 


68936 
14589 
48108 
95 1.919 
36717 


89401 
74215 
ISS ies 
eS Srle2 
88716 


74487 
93500 
fF A A Eh 
719475 
01918 


99619 
39872 
02352 
07695 
50808 


20863 
53:5: 11 
17288 
95085 
42695 


98263 
41266 
34381 
$5223 
37544 


14942 
29242 
23376 
32963 
64219 


60941 
09153 
80838 
06153 
20011 


74699 
Buia) 
60426 
07252 
47354 


= 


o PE Saal Do 


i 
‘ 
kc] 
| 
a 
t 
1 
1 
r) 








Table 15 (Continued ) sal 
| Random Digits 

3153148095 0163057115 44739 26312 68345 68221 

59292 16215 51040 99934 74608 96802 38402 73813 

34172 04973 01793 67995 25390 8386] 31324 22329 

55867 54715 23927 80668 84573 36067 22915 95681 

76360 72976 01760 44282 98731 83555 12183 09803 

50418 36528 69287 59951 82973 27655 83915 11430 

29733 41389 77388 35630 9335177534 65027 35853 

38103 95561 49260 82008 20327 25049 31321 14642 

04223 99352 09073 84483 59064 76376 06764 66908 

40722 28253 23527 57504 91017 88325 69274 66864 

76099 76287 45241 45147 88001 28415 21056 88190 

95426 82300 16743 18001 39763 98192 43383 95093 

40001 45549 61549 91297 05432 92682 62538 80338 

31005 19057 68167 89797 15253 97775 21658 45135 

96511 88634 18992 86894 36575 10265 71836 44024 

20508 44694 83129 85287 27.277 86908 88233 06921 

38727 33012 47645 30384 11018 20368 31151 38065 | 
23972 68014 93315 88123 16285 86222 34871 09893 
63236 19222 94061 92027 07163 28134 15994 70102 : 
OVI1S 38537 $2434 18506 77530 79390 71266 7051e7 


97835 36745 68946 29473 


78952 21090 74575 92675 





21570 11526 01095 54467 74764 79599 35026 51879 
64788 76012 20007 16220 69643 17339 18589 43843 ! 
76881 06219 22328 18374 33474 77195 11584 20068 | 
47005 54849 32483 71661 49193 80388-75115 77841 
42034 83668 89405 40782 37789 58788 54635 27077 
55862 47528 68543 45201 56625 0.6.4.8 Sez ee 
3758103756 61884 23279 39380 52557 908 
68477 70249 61999 98149 38343 29428 oe 
09070 04575 95717790053 49422 69543. 
69 
35665 61707 46988 87555 76487 03 3? pesewee 
24142 31078 68713 95984 10349 sog4 jusee done 
65680 88011 55416 41472 85929 70168 10288 
92463 28512 10051 56376 11857 17073 06014 
66557 1527 


18327 35770 74715 


















Table 15 (Continued ) 


4 


Random Digits 





64119 
99487 
33487 
59084 
28392 


18843 
16775 
s Ey Py As 
70428 


75540 


55011 
45256 
27450 
83211 
01081 


43733 
PST ey 
88906 
01781 
86319 


15142 
47858 
50841 
72075 
ise DOL 


29024 
16510 
55930 
93483 
81151 


69469 


83151 


64963 
22341 


- 


76616 


11674 
84735 
12940 
16322 
11748 


06459 
44189 
38833 
14106 
43367 


81643 
85777 
75061 
90459 
95300 


66312 
83057 
0417090 
16820 
82730 


70090 
87100 
64632 
27307 


76957 


82572 
51461 
46033 
24150 
20032 


87910 
73238 
05580 
49421 
74956 


34433 
18567 
81066 
61468 
93258 


38892 
86040 
35809 
67084 
66893 


83608 
49392 
90673 
11077 
45883 


89736 
5335595 
86180 
81576 
96410 


66357 
82344 
02451 
7703.4 
65153 


64194 
58797 
07817 
20013 
06443 


96381 


34258 
18568 
88538 
71033 


80653 
06167 
03932 
10892 
38469 


16345 
70478 
56718 
16900 
62172 


00358 
34030 
87307 
61617 
7/3 UES 


61899 
03599 
25240 
60679 
23505 


39584 
27498 
73548 
98096 
56256 


37612 
41945 
48020 
58713 
98872 


719558 
66251 
61200 
38456 
44945 


Zoos | 
25868 
22995 
45293 
66641 


52877 
08742 
88865 
57991 
52680 


67205 
01333 
61117 
45434 
20818 


71169 
84502 
Zi2ia SO 
92598 
89424 


27701 
62815 
87668 
39624 
790384 


12688 
54267 
63293 
85030 
85398 


27 N27 
25888 
61639 
16923 


28:9°2905 


17230 
89254 
63819 
92002 
21238 


37816 
51497 
89718 
27293 
29871 


90420 
73306 
02154 
74188 
NiO;2'3:2 


44481 
81021 
76362 
60516 
73665 


19869 
31773 
53544 
20637 
Le7ale sie 


52138 
81621 
85080 
63067 
54162 


07147 
63555 
24896 
17510 
64612 


3:1. 2 
44284 
93008 
61615 
89783 


64429 
72241 
64221 
24933 
43473 


21093 
37140 
45530 
74107 
27402 


02216 
05600 
77569 
24450 
06425 


91501 
71180 
54121 
45372 
48864 


38805 
TIRES ET 
17929 
68492 
46466 


41653 
30446 
06224 
91239 
01573 


7 ee Be 2 be 
36531 
91036 
63042 
73263 


74941 
mi Py PR ie 
S.7.3283 
67225 
92425 


81661 
29328 
{be Sx AY 
04174 
17652 


24818 
39627 
33507 
50498 
41488 


20927 
04575 
75902 
88899 
73643 


63458 
01977 
13394 
16757 
20674 


42271 
96100 
32626 
24956 
68408 


Ee 





M5 
Table 15 (Continued ) 





62849 
06811 
67843 
11790 
64049 


89196 
1692] 
20268 
42626 
97655 


539851 
533223 
61374 
30394 
01788 


40999 
47470 
84513 
14877 
94907 


84030 
96852 
82112 
20209 
30793 


64223 
00229 
44690 
74392 
01635 


790308 
12963 
13343 
52959 
76422 


00522 
91997 
71012 
61854 
$5624 


92205 
39390 
78442 
67896 
11384 


19594 
97891 
mS J fi 
63965 
53424 


74948 
31792 
28834 
08250 
3220] 


$9987 
67153 
87617 
92328 
14699 


279538 
98223 
06767 
33377 
19198 


532913 
96967 
6992% 
24318 
713257 


32055 
035 20 3 
wi Sling, 
25065 
94564 


48557 
40432 
13053 
243:3'1 
03388 


36606 
14874 
90149 
95392 
oe. fee 1 2 Be 


16189 
39382 
76355 
42031 
11546 


44352 
23335 
19769 
68970 
61789 


27404 
51879 
99785 
69104 
1305) 


92908 
16547 
92.789 
T9204 
9ABRD 


Random Digits : 


295 61 
868 06 
11850 
68551 
31454 


25271 
57520 
72094 
40302 
21018 


04785 
26879 
22485 
§ 2832 
89682 


29405 
60203 
83630 
54551 
49077 


44464 
33979 
71019 
37019 
41828 


01566 
76770 
§3321 
88737 
1621 


372948 


dan co Nw me 
icon 
Juana a 
nik we 


A 
} 
Th 
au. 


§ 89400 | 


4 7 1% — 


SH517 
KT 265 
$5739 
90495 
7H#O012% 


66936 
875072 
28927 
10069 
$6702 


66944 
95552 
88936% 
21216 
00349 


22474 
84439 
64225 
71282 
66448 


$18 
433567 
29265 
75233 
96823 


51826 
TIPA2 
42019 
12944 


Ae 
LGYGS 
49I9ZY 
$2496 
O74 


46962 
Sis 
17295 
609 36 
62725603 


9SS6S 
99342 
$6064 
P2066 
7 $4 26 


G4 7276 
& TSOP 
9 DLR 
$27 FA 


TRESD 


2 ha Ms 
SETRS 
L&R SD 
SHKDD 
372 aMS 


EDERD 
QAR 


BHERR 7 


BBSSSE 































Ve ee CPL 
GE CL PPS 
GZUSP PPOIY 
SGEP. OPPS 
$4SS7 Svos 


FISSFT HP yKST 
SETI) SIRF 
SECTS 7 ESSE 
S7TESE (ST es 
16909 CES7Te 


TOSSES 1TH 
STisey SaeeR 
TLE Se Teese 
T732LHS 2539S 
RITES HET HD 


Ley SOAR 
FETS Ba FOr 
OT ake DESSS 
PPT Se HORT S= 
REKOT MIS 


RDUTE ETS 
THEE SHES 
RAR th PASE 
PARK PEZWRS 
LT EAD YS ERS 


' be * 
. ‘a . “es y ~ Ai « # 
f — 
ois ad i 
P “ é fi - y 


~ le M te, by ~ 
wee > eel ee 
} ~ 
ae 
» t 
: i 2 ad ae - 











a >. 
7 pealy 


oe 


11590 A92Q% AGMA 
BA2K2 ROS \ Ba 


| SHALL BBRY 
O4HLA HAMS 


94939 


BARA 


Ass 


‘ 3 
* ta “ys 














" ‘Table 15 (Continued ) 


306 
Random Digits 





40887 
39769 
19071 
11872 
18585 


08873 
86461 


39029 


80117 


63750 


07933 
90675 
53674 
85478 
36897 


98500 
37490 
esa lor si hy 
26995 
47418 


40921 
75463 
03820 
48752 
75275 


32521 
32933 
36958 
32470 
62177 


28 633 
$0557 
45958 
41041 
30208 


08843 
98382 
77964 
05718 
93417 


45777 
23620 
73019 
27669 
535853 


53878 
35073 
32002 
97431 
27107 


76027 
45221 
23468 
12579 
24554 


34239 
02620 
97524 
85933 
97488 


35461 
29941 
28914 
85046 
87069 


08504 
84859 
50321 
89788 
02012 


04158 
42082 
Sst 17 
35665 
38142 


86663 
76270 
40640 
sys} isk s/h 
99135 


47608 
TS) UU 
85103 
16401 
94133 


2d 51 5) 2 
23528 
28295 
Ses} UES 
98181 


08148 
43318 
70314 
SAS 
50896 


01764 
78600 
Side 9 
67117 
80588 


02639 
47794 
14660 
80677 
95331 


17766 
74472 
11438 
37597 
23635 


21681 
06756 
26037 
Seo 
94259 


20735 
02817 
63110 
97445 
85891 


60626 
31596 
24608 
87529 
92334 


85563 
73497 
08956 
6507.5 
13800 


26246 
67557 
49889 
38300 
74671 


29316 
99929 
96664 
06772 
85327 


62048 
65978 
43497 
16967 
33162 


27600 
86623 
48449 
22612 
06904 


95654 
93162 
06773 
TIL Se 
81015 


43973 
76135 
16619 
05453 
77016 


10233 
74035 
66818 
99696 
08785 


39689 
84177 
72203 
09237 
03078 


44232 
37987 
91769 
58746 
13841 


12460 
83223 
43734 
99887 
62992 


76307 
2099 
11424 
03658 
30371 


37742 
OFITISS:9 
82875 
62706 
33897 


58516 
98432 
92676 
48051 
80723 


66315 
85246 
93279 
537472 
29256 


719662 
$8316 
62534 
06.639 
43562 


39546 
94291 
54657 
77404 
60117 


67670 
15014 
92867 
17300 
4063 6 


21629 
19009 
84280 
91474 
60548 


78738 
07621 
56382 
52829 
10552 


11768 
36975 
60771 


58565 


7661 6 


02912 
03539 
7AM FI Hi) | 
69090 
18768 


39141 
04789 
38924 
63399 
03669 
86271 
77088 
26878 
08983 
57560 


74160 
85530 
97585 
73206 
45108 


72281 
29505 
49349 
33821 
11847 


05297 
83393 
83653 
75290 
10327 


55230 
24477 
85960 
13699 
92913 


08024 
51573 
16494 
12551 
16758 


91204 
59504 
93945 
86652 
151565752 


30435 
50814 
09854 
36131 
16089 


